目录
[a1-a2] :a1~a2之间必须是连续的,比如:[0-9],[a-z],[A-Z]
[[:alnum:]] :匹配任意一个字母或者数字,等价于[A-Za-z0-9]
[[:upper:]] :匹配任意一个大写字母,等价于A-Z
[[:alpha:]] :匹配任意一个字母,等价于[A-Za-z]
[[:space:]] :匹配任意一个空白符,包括空格、制表符、换行符以及分页符
[[:graph:]] :匹配任意一个看得见的可打印字符,不包括空白字符
[[:print:]] :匹配任何一个可以打印的字符,包括空白字符,但是不包括控制字符、字符串结束符‘\0’、EOF文件结束符(-1)
[[:cntrl:]] :匹配任何一个控制字符,即ASCII字符集中的前32个字符。例如换行符、制表符等
[[:punct:]] :匹配任何一个标点符号,例如“[]”、“{}”或者“,”等
[[:xdigit:]] :匹配十六进制数字,即0-9、a-f以及A-F
(?=…) :正向预搜索,它不消耗我们的分组: 只做判定条件不返回
(?(id/name)yes-pattern|no-pattern)
Linux 中使用正则表达式 grep、sed、awk...
grep 可分为三大成员 :
grep (支持基本的正则表达式)、
egrep (支持扩展的正则表达)
fgrep (不支持任何正则表达式,即后跟内容只有字面意思)
1、正则表达式常用选项
grep命令的基本语法如下:grep [options] pattern [file…]
-E --extend-regexp
grep -E 支持扩展的正则表达式
-P --perl-regexp
grep -P 可以支持更为复杂的正则表达式
-e PATTERN, --regexp
支持正则表达式
-o 只输出匹配到的内容
eg:
[root@localhost day07]# echo "a123" | grep -o '123'
123
-n 显示行号
-i 忽略大小写
eg:
[root@localhost day07]# echo "A123" | grep -i "a"
A123
[root@localhost day07]# echo "A123" | grep -io "a"
A
-v 反选,不匹配所指定的内容
eg1:
[root@localhost day07]# printf "a123\nb123\n" | grep -v "a"
b123
eg2:
[root@localhost day07]# printf "a123\nb123\n\nd123\n"
a123
b123d123
[root@localhost day07]# printf "a123\nb123\n\nd123\n" | grep -v "^$"
a123
b123
d123
-w 匹配单词
eg:
[root@localhost day07]# printf "apple\napple\nbanana\n" | grep -w "apple"
apple
apple
2、基本正则表达式
正则表达式包含有:1、元字符(特殊字符表示特殊意义);2、转义符(\A、\b、\B...);3、分组(分组之后可以提取文本中的一部分内容)
^ :代表以...开始
eg:
[root@localhost day07]# printf "a123\na234\nb123\nb234\n" | grep "^a"
a123
a234
$ :代表以...结束
eg:
[root@localhost day07]# printf "a123\na234\nb123\nb234\n" | grep "3$"
a123
b123
. :对任意的单个字符进行匹配
eg:
[root@localhost day07]# printf "a123\na234\nb123\nb234\n" | grep ".1"
a123
b123
* :*之前的正则表达式匹配0次或任意多次
eg1:
[root@localhost day07]# printf "a3\na33\nb333\nb222\n" | grep "3*"
a3
a33
b333
b222
[root@localhost day07]# printf "a3\na33\nb333\nb222\n" | grep -o "3*"
3
33
333
eg2:
[root@localhost day07]# echo "abcdabcdabcd"| grep "abc*"
abcdabcdabcd
[root@localhost day07]# echo "abcdabcdabcd"| grep -o "abc*"
abc
abc
abc
[] :匹配括号里面的单个字符
#eg1:
[root@localhost day07]# printf "a3\nab33\nb333\n" | grep -o [abc]
a
a
b
b
[root@localhost day07]# printf "a3\nab33\nb333\n" | grep [abc]
a3
ab33
b333
#eg2:
[root@localhost day07]# printf "a123\nb123\nab123\nc123\n" | grep -o "[ab]123"
a123
b123
b123
[^str] :匹配str的补集,也就是str以外的内容
eg:
[root@localhost day07]# printf "a3\nab33\nb333\n" | grep [^a]
a3
ab33
b333
[root@localhost day07]# printf "a3\nab33\nb333\n" | grep -o [^a]
3
b
3
3
b
3
3
3
[a1-a2] :a1~a2之间必须是连续的,比如:[0-9],[a-z],[A-Z]
eg:
[root@localhost day07]# printf "a123\nab345\nB567\n" | grep [0-9]
a123
ab345
B567
[root@localhost day07]# printf "a123\nab345\nB567\n" | grep [A-Z]
B567
\b、 \< :其后面的任意字符必须作为单词首部出现
eg:
[root@localhost day07]# echo "hello world" | grep "\bhello"
hello world
[root@localhost day07]# echo "hello world" | grep -o "\bhello"
hello
[root@localhost day07]# echo "hello world" | grep -o "\<hello"
hello
\b、 \> :其前面的任意字符必须作为单词尾部出现
eg:
[root@localhost day07]# echo "hello world" | grep -o "hello\>"
hello
[root@localhost day07]# echo "helloworld" | grep -o "hello\>"
[root@localhost day07]# echo "helloworld" | grep -o "hello\b"
#有符号情况下符号是会认为是空
[root@localhost day07]# echo "hello~world" | grep -o "hello\b"
hello
示例1~4
示例1:取出x、y、z开始的hello
[root@localhost day07]# printf "xhello\nyhello\nzhello" | grep "[xyz]hello"
xhello
yhello
zhello
示例2:取出非abc开始的hello
[root@localhost day07]# printf "xhello\nyhello\nzhello" | grep -v "[abc]hello"
xhello
yhello
zhello
[root@localhost day07]# printf "xhello\nyhello\nzhello" | grep "[^abc]hello"
xhello
yhello
zhello
示例3:从字符串中只取出hello
[root@localhost day07]# printf "xhelloz\nyhelloz\nzhelloz" | grep "hello*"
xhelloz
yhelloz
zhelloz
[root@localhost day07]# printf "xhelloz\nyhelloz\nzhelloz" | grep -o "hello*"
hello
hello
hello
示例4:从字符串中取出包含hello的整个字符串
[root@localhost day07]# printf "xhelloz\nyhelloz\nzhelloz\nhi\n" | grep -o ".*hello.*"
xhelloz
yhelloz
zhelloz
[root@localhost day07]# printf "xhelloz\nyhelloz\nzhelloz\nhi\n"
xhelloz
yhelloz
zhelloz
hi
-
正则表达式字符集
[[:alnum:]] :匹配任意一个字母或者数字,等价于[A-Za-z0-9]
eg:
[root@localhost day07]# echo "goodAB1234" | grep "[[:alnum:]]*"
goodAB1234
[[:digit:]] :匹配任意一个数字
eg:
[root@localhost day07]# echo "123456" | grep "[[:digit:]]*"
123456
[[:lower:]] :匹配小写字母 等价于[a-z]
eg:
[root@localhost day07]# echo "123abcde" | grep "[[:digit:][:lower:]]*"
123abcde
[[:upper:]] :匹配任意一个大写字母,等价于A-Z
[[:alpha:]] :匹配任意一个字母,等价于[A-Za-z]
[[:space:]] :匹配任意一个空白符,包括空格、制表符、换行符以及分页符
[root@localhost day07]# printf "111\nAAA" | grep "[[:digit:][:space:][:upper:]]"
111
AAA
[[:blank:]] :匹配空格和制表符
[[:graph:]] :匹配任意一个看得见的可打印字符,不包括空白字符
[[:print:]] :匹配任何一个可以打印的字符,包括空白字符,但是不包括控制字符、字符串结束符‘\0’、EOF文件结束符(-1)
[[:cntrl:]] :匹配任何一个控制字符,即ASCII字符集中的前32个字符。例如换行符、制表符等
[[:punct:]] :匹配任何一个标点符号,例如“[]”、“{}”或者“,”等
[[:xdigit:]] :匹配十六进制数字,即0-9、a-f以及A-F
示例1
输出引号里面"hello world"内容
[root@localhost day07]# echo "hello world" | grep "[[:alnum:]]"
hello world
[root@localhost day07]# echo "hello world" | grep "hello.*world"
hello world
[root@localhost day07]# echo "hello world" | grep "hello *world"
hello world
[root@localhost day07]# echo "hello world" | grep "hello[[:space:]]*world"
hello world
3、扩展正则表达式
grep [options] pattern [file…];这里的options至少要有-E或者-P
+ :对前一项进行1次或多次重复匹配
eg:
[root@localhost day07]# echo "aabbbb" | grep -Po "a+"
aa
[root@localhost day07]# echo "aaaaaa" | grep -P "a+"
aaaaaa
? :对前一项进行0次或1次重复匹配
eg:
[root@localhost day07]# echo "abbbb" | grep -P "a?"
abbbb
[root@localhost day07]# echo "bbbb" | grep -P "a?"
bbbb
[root@localhost day07]# echo "aaabbbb" | grep -Po "a?"
a
a
a
(s|t) :匹配s项或t项中的一项
eg:
[root@localhost day07]# echo "0412" | grep -Eo "040[1-9]|041[0-9]"
0412
[root@localhost day07]# echo "0412" | grep -Eo "^(040[1-9]|041[0-9])$"
0412
[root@localhost day07]# echo "04122" | grep -Eo "^(040[1-9]|041[0-9])$"
{n} :对前一项进行n次重复匹配
eg:
[root@localhost day07]# echo "aaaaaa" | grep -E "a{4}"
aaaaaa
[root@localhost day07]# echo "aaaaaa" | grep -Eo "a{4}"
aaaa
[root@localhost day07]# echo "aaaaaa" | grep -P "a{4}"
aaaaaa
[root@localhost day07]# echo "aaaaaa" | grep -Po "a{4}"
aaaa
{n,m} :对前一项进行n~m次匹配,尽可能多的匹配
eg:
[root@localhost day07]# echo "aaaaaa" | grep -Po "a{2,5}"
aaaaa
{n,} :对前一项进行n次或更多次重复匹配
eg:
[root@localhost day07]# echo "aaaaaaaaaa" | grep -Eo "a{10,}"
aaaaaaaaaa
{,k} :对前一项最多进行k次重复匹配
eg:
[root@localhost day07]# echo "aaaaaaaaaa" | grep -Eo "a{,8}"
aaaaaaaa
aa
-
贪婪模式和非贪婪模式的区别
贪婪模式:使用(*, + , ?, {n,m}, {n,}, {,m})进行匹配时尽可能多的进行匹配
非贪婪模式:尽可能少的进行匹配(*?, +?, ??, {n,m}?, {n,}?, {,m}?)只要匹配到即可(比如a*匹配0~多次那么非贪婪模式下匹配零次也算是匹配到了)
注意:这里options至少要选择-P
eg:
[root@localhost day07]# echo "111111" | grep -Po "1*?"
[root@localhost day07]# echo "111111" | grep -Po "1+?"
1
1
1
1
1
1
[root@localhost day07]# echo "111111" | grep -Po "1??"
[root@localhost day07]# echo "111111" | grep -Po "1{,5}?"
[root@localhost day07]# echo "111111" | grep -Po "1{3,}?"
111
111
-
*和+的区别
*是可以以匹配0次及以上的,+是匹配1次及以上的,所以当*之前想要进行输入的内容没有时依旧会把原内容输入,而+不会输出内容。
[root@localhost day07]# echo "bbbb" | grep -P "a+"
[root@localhost day07]# echo "bbbb" | grep -P "a*"
bbbb
4、转义符
\A :匹配以字符串开始的内容
eg:
[root@localhost day07]# echo "hello world" | grep -P "\Ahello"
hello world
[root@localhost day07]# echo "hello world" | grep -Po "\Ahello"
hello
\b :匹配空字符,字符串开始或结束
eg1:
匹配一个字符串开始和结束
[root@localhost day07]# echo "hello world" | grep -Po "\bhello\b"
hello
eg2:匹配一个字符串的开始
[root@localhost day07]# echo "computer" | grep -Po "\bcom"
com
eg3:匹配一个字符串的结束
[root@localhost day07]# echo "computer" | grep -Po "ter\b"
ter
eg4:匹配一个空字符
[root@localhost day07]# echo "hello world" | grep -Po "\b \b"
[root@localhost day07]# echo "hello world" | grep -P "\b \b"
hello world
\B :匹配非空字符
eg:
代表的是world后面还有内容所以返回
[root@localhost day07]# echo "hello world" | grep -P "world\B"
[root@localhost day07]# echo "hello world123" | grep -P "world\B"
hello world123
\d :匹配数字[0-9]
eg:
[root@localhost day07]# echo "12 hello 12" | grep -Po "\d"
1
2
1
2
\D :匹配非数字
eg:
空格也会进行输出
[root@localhost day07]# echo "123 hi" | grep -Po "\D"
h
i
\s :匹配空格,制表符,换行
eg:
匹配制表符
[root@localhost day07]# printf "hello\tworld" | grep -Po ".*\s.*"
hello world
eg:匹配空格
[root@localhost day07]# echo "hello world" | grep -Po ".*\s.*"
hello world
eg:匹配换行
[root@localhost day07]# printf "hello\nworld\n" | grep -Pzo "\w+\s\w+"
hello
world
\S :匹配非空格字符
eg:
[root@localhost day07]# echo "123hello456" | grep -Po "\d{3}\S+\d{3}"
123hello456
[root@localhost day07]# echo "123 456" | grep -Po "\d{3}\S+\d{3}"
\w :匹配数字,字母和下划线
eg:
[root@localhost day07]# echo "123 45_6" | grep -Po "\w+\s\w+"
123 45_6
\W :匹配\w的补集
eg:
[root@localhost day07]# echo "123? ?456" | grep -Po "\d+\W+\d+"
123? ?456
\Z :匹配字符串的结束类似于$
eg:
[root@localhost day07]# echo "hello world" | grep -Po "world\Z"
world
[root@localhost day07]# echo "hello world" | grep -Po "hello\Z"
[root@localhost day07]#
5、分组
使用的语法:(): 如果一个正则表达式中出现了多个小括号,代表多个分组 eg:a(b(c(d)))e
意味着有是三个分组:引用分组的方法: 组名:手动去命名 组号:系统字体生成编号: 从左到右去数左括号:第一个左括号:代表分组1,第二个左括号代表分组2, 以此类推。 分组0: 指的是匹配成功的整个字符串。
引用的时候:使用\number来引用
(...)
eg:
[root@localhost day07]# echo "abcdebcd" | grep -P "a(b(c(d)))e\1"
abcdebcd
[root@localhost day07]# echo "abcdebcd" | grep -P "a(b(c(d)))e\1\2"
[root@localhost day07]# echo "abcdebcdcd" | grep -P "a(b(c(d)))e\1\2"
abcdebcdcd
[root@localhost day07]# echo "abcdebcdcdd" | grep -P "a(b(c(d)))e\1\2\3"
abcdebcdcdd
(?…) :获取内容是否在文本中出现匹配0次或多次
eg:
[root@localhost day07]# echo "hello" | grep -P 'h?'
hello
[root@localhost day07]# echo "hello" | grep -P 'l?'
hello
(?:…) :添加之后分组不能被引用
eg:
[root@localhost day07]# echo "abcdebcdcd" | grep -P "a(b(?:c(d)))e\1\2"
[root@localhost day07]#
(?P<name>…) :给分组命名
组名引用 \num
或者 (?P=name)
eg:
[root@localhost day07]# echo "abcdebcd" | grep -P "a(?P<g1>b(c(d)))e\1"
abcdebcd
[root@localhost day07]# echo "abcdebcd" | grep -P "a(?P<g1>b(c(d)))e(?P=g1)"
abcdebcd
(?#…) :注释,不参与匹配
eg:
[root@localhost day07]# echo "abcdebcd" | grep -P "a(?#group1)(b(c(d)))e\1"
abcdebcd
(?=…) :正向预搜索,它不消耗我们的分组: 只做判定条件不返回
eg:
[root@localhost day07]# echo "windows10" | grep -P "windows(?=10)"
windows10
[root@localhost day07]# echo "windows10" | grep -P "windows(?=10|11)"
windows10
[root@localhost day07]# echo "windows11" | grep -P "windows(?=10|11)"
windows11
[root@localhost day07]# echo "windows8" | grep -P "windows(?=10|11)"
[root@localhost day07]#
(?!…) :正向预搜索取非
eg:
注意grep的时候后面pattern用单引号引起来
[root@localhost day07]# echo "windows8" | grep -P 'windows(?!8)'
[root@localhost day07]# echo "windows10" | grep -P 'windows(?!8)'
(?<=…) :反向预搜索,不消耗分组内容
eg:
[root@localhost day07]# echo "Centos7" | grep -P '(?<=Centos)7'
Centos7
[root@localhost day07]# echo "Redhat7" | grep -P '(?<=Centos)7'
[root@localhost day07]#
(?<!…) :反向预搜索取非
eg:
[root@localhost day07]# echo "Centos7" | grep -P '(?<!Centos)7'
[root@localhost day07]# echo "Redhat7" | grep -P '(?<!Centos)7'
Redhat7
(?(id/name)yes-pattern|no-pattern)
eg:
[root@localhost day07]# echo "root@123" | grep -Po '(<)?(root@123)(?(1)>|$)'
root@123
[root@localhost day07]# echo "<root@123>" | grep -Po '(<)?(root@123)(?(1)>|$)'
<root@123>