Linux三剑客：grep、sed与awk入门_grep、sed、awk入门-CSDN博客

本文链接：https://blog.csdn.net/DanielSYC/article/details/124436695

grep用于在文件中查找：

$ grep 'ui' id.txt

oldboy 57089234758ui
#语法：grep [-nviwcE] "RE" (from) somewhere

参数：

参数	含义
-n	–line-number
-v	–invert-match
-i	–ignore-case
-w	–word-regexp：只显示全字符合的列
-c	–count
-E	–extended-regexp：扩展正则

正则表达式

`^`开头

^x查询以x开头的行：grep '^my' oldboy.txt

$ cat -n oldboy.txt 
     1	I am oldboy teacher!
     2	I teach linux.
     3	
     4	I like badminton ball ,billiard ball and chinese chess!
     5	my blog is http://oblboy.blog.51cto.com
     6	our size is http://blog.oldboyedu.com
     7	my qq is 49000448
     8	
     9	not 4900000448
    10	my god ,i am not oldbey,but OLDBOY!
    
$ grep -n '^my' oldboy.txt
5:my blog is http://oblboy.blog.51cto.com
7:my qq is 49000448
10:my god ,i am not oldbey,but OLDBOY!

oldboy.txt：

I am oldboy teacher!
I teach linux.

I like badminton ball ,billiard ball and chinese chess!
my blog is http://oblboy.blog.51cto.com
our size is http://blog.oldboyedu.com
my qq is 49000448

not 4900000448
my god ,i am not oldbey,but OLDBOY!

`$`结尾

x$查询以x结尾的行：grep '448$' oldboy.txt

$ grep '448$' oldboy.txt
my qq is 49000448
not 4900000448

cat -A显示文件中的空行和空格：

$ cat -A oldboy.txt 
I am oldboy teacher!$
I teach linux.$
$
I like badminton ball ,billiard ball and chinese chess!$
my blog is http://oblboy.blog.51cto.com $
our size is http://blog.oldboyedu.com $
my qq is 49000448$
$
not 4900000448$
my god ,i am not oldbey,but OLDBOY!$

`^$`空行

^$

$ grep -n '^$' oldboy.txt 
3:
8:

排除空行：grep -v '^$' oldboy.txt

`.`任意一个字符

但是不匹配空行

$ grep '.' oldboy.txt 
I am oldboy teacher!
I teach linux.
I like badminton ball ,billiard ball and chinese chess!
my blog is http://oblboy.blog.51cto.com 
our size is http://blog.oldboyedu.com 
my qq is 49000448
not 4900000448
my god ,i am not oldbey,but OLDBOY!

`\`转义字符

$ grep '\.$' oldboy.txt 
I teach linux.

`*`前一个字符连续出现0次及以上

grep '0*' oldboy.txt ，数字0出现0次及以上

`.*`表示所有

grep '.*' oldboy.txt

对于每一行查找到字母t：grep '^.*t' oldboy.txt

正则贪婪性，会一直匹配到最后一个字符：

在这里插入图片描述

`[abc]`匹配abc中的任何一个

grep '[abc]' oldboy.txt

匹配小写字母：grep '[a-z]' oldboy.txt
匹配大写字母：grep '[A-Z]' oldboy.txt
匹配数字：grep '[0-9]' oldboy.txt

匹配所有大小写字母和数字：

grep '[a-zA-Z0-9]' oldboy.txt
grep '[a-Z0-9]' oldboy.txt
grep -i '[a-z0-9]' oldboy.txt	#-i表示忽略大小写

注意这里的“或”逻辑的表达是直接将模式串拼在一起，不需要用显示地指出

`[^abc]`排除（否定）abc

grep '[^abc]' oldboy.txt

不以mn开头的行：grep '^[^mn]' oldboy.txt

不以数字结尾的行：grep '[^0-9]$' oldboy.txt

基础正则总结

基础正则	含义
^	以……开头
$	以……结尾
^$	空行
.	任意一个字符
*	前一个字符连续出现0次及以上
.*	所有内容，不包括空行
\	转义字符
[]	取其中的一个
[^]	取反排除

扩展正则：

+前一个字符出现1次及以上

如何让grep支持扩展正则：grep -E '[0+] oldboy.txt

匹配文件中所有的数字或单词：

grep -E '[0-9]+' oldboy.txt
grep -E '[a-z]+' oldboy.txt

`|`或者

grep -E 'oldboy|oldbey' oldboy.txt

$ grep -E 'oldboy|oldbey' oldboy.txt 
I am oldboy teacher!
our size is http://blog.oldboyedu.com
my god ,i am not oldbey,but OLDBOY!

与[]的区别：[]一次只挑选出一个字符，而|从条件中挑选出一个单词

`()`表示一个字符整体

grep -E 'oldb(o|e)y' oldboy.txt

$ grep -E 'oldboy|oldbey' oldboy.txt 
I am oldboy teacher!
our size is http://blog.oldboyedu.com
my god ,i am not oldbey,but OLDBOY!

$ grep "oldb[oe]y" oldboy.txt
I am oldboy teacher!
our size is http://blog.oldboyedu.com
my god ,i am not oldbey,but OLDBOY!

$ grep -E 'oldb(o|e)y' oldboy.txt
I am oldboy teacher!
out size is http://blog.oldboyedu.com
my god, i am not oldbey, but OLDBOY!

`x{n,m}`前一个字符x出现n到m次

grep -E '0{3,4}' oldboy.txt，表示0至少出现3次，至多出现4次
grep -E '0{3}' oldboy.txt，表示0正好出现3次
grep -E '0{3,}' oldboy.txt，表示0至少出现3次
grep -E '0{,3}' oldboy.txt，表示0正好至多3次

`?`前一个字符出现0或1次

egrep 'go?d' oldboy.txt

$ grep -E 'o?bl' oldboy.txt 
my blog is http://oblboy.blog.51cto.com
our size is http://blog.oldboyedu.com

扩展正则总结

符号	含义
+	前一个字符连续出现1次或1次及以上
\|	或者
()	作为一个整体；sed中作为捕获组
{n,m}	至少连续出现n次，最多连续出现m次
?	前一个字符出现0次或1次

匹配身份证号：

grep -E '[0-9]{17}[0-9X]' id.txt

排除空行或者含有#的行

grep -E -v '^$|#' file.txt

三剑客：

命令	场景
grep	过滤
sed	替换；修改文件内容；取行
awk	取列；统计计算

grep

参数	含义
-A	after
-B	before
-C	contex
-c	count
-v	invert
-n	line number
-i	ignore
-w	word-regexp精确匹配

$ seq 10 | grep -A5 2
2
3
4
5
6
7

$ seq 10 | grep -B3 9
6
7
8
9

$ seq 10 | grep -C2 5
3
4
5
6
7

$ grep -A3 '^our' oldboy.txt 
our size is http://blog.oldboyedu.com
my qq is 49000448

not 4900000448

-c统计行数

$ ps -ef | grep sshd | wc -l
2

$ ps -ef | grep -c sshd
2

-v可以排除，可以用于排除ps结果中的grep进程：ps -ef | grep cround | grep -v grep

$ ps -ef | grep fuck | grep -vc 'grep'
0

-w用于精确匹配

$ grep -w 22 /etc/services
ssh		22/tcp				# SSH Remote Login Protocol

sed

stream editor

查找p

-n：仅显示script处理后的结果

格式	含义	实例
‘3p’	第3行	`sed -n '3p' oldboy.txt`
‘4,7p’	第4到7行	`sed -n '4,7p' oldboy.txt`
‘4,$p’	第4到末尾	`sed -n '4,$p' oldboy.txt`
‘$p’	到末尾	`sed -n '$p' oldboy.txt`
‘/[RE]/p’	/regular expression/正则匹配	`sed -n '/[45]/p' oldboy.txt` `sed -n '/oldboy/p' oldboy.txt`
‘/102/,/105/p’	从包含102的行到包含105的行，表示范围	`sed -n '/11:02:00/,/11:03:01/p' oldboy.txt`

删除d

-r：支持扩展正则；查找的过程同上
删除空行或包含#的行：sed -r '/^$|#/d' oldboy.txt或者用!取反的p：sed -nr '/^$|#/!p' oldboy.txt

$ sed -r '/^$|#/d' oldboy.txt
I am oldboy teacher!
I teach linux.
I like badminton ball, billiard ball and chinese chess!
my blog is http://odlboy.blog.51cto.com
out size is http://blog.oldboyedu.com
my qq is 49000448
not 490000448.
my god, i am not oldbey, but OLDBOY!

$ sed -nr '/^$|#/!p' oldboy.txt

增加cai

在第3行的下一行添加：sed '3a 996daniel,hello' oldboy.txt
在第3行的上一行添加：sed '3i 996daniel,hello' oldboy.txt
替换掉第3行：sed '3c 996daniel,hello' oldboy.txt
在结尾添加：sed '$a hello,daniel' oldboy.txt

替换s(substitute)

sed 's/a/b/g' file.txt

sed 's/[0-9]//g' oldboy.txt		#substitute numbers to emtpy in global
sed 's/[0-9]/*/g' oldboy.txt	#substitute numbers to * in global

后向引用和反向引用

先保护起来()，再使用\n引用
在123456左右加上<>：

$ echo 12345 | sed -r 's/([0-9]*)/<\1>/'
<12345>
$ echo 12345 | sed -r 's/(.*)/<\1>/'
<12345>

交换前后位置：sed 's#^(.*)_(.*)$#\2_\1#'

$ echo fuck_you | sed -r "s/(.*)_(.*)/\2_\1/"
you_fuck
#or
$ echo fuck_you | sed -r "s/^([a-z]+)_([a-z]+)/\2_\1/"
you_fuck

$ sed -n '9p' oldboy.txt | sed -r 's/([a-z]+) ([0-9]+)/\2 \1/'
4900000448 not

显示网卡的ip信息：

$ ifconfig ens33
ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.163.130  netmask 255.255.255.0  broadcast 192.168.163.255
        inet6 fe80::bd7f:f48:2f7a:a466  prefixlen 64  scopeid 0x20<link>
        ether 00:0c:29:96:18:65  txqueuelen 1000  (Ethernet)
        RX packets 557  bytes 51894 (51.8 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 69  bytes 7288 (7.2 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

$ ifconfig | sed -n '2p' | sed -r 's/.*inet (.*)  netmask.*$/\1/'
192.168.163.130

awk

执行过程

awk 'BEGIN{print 1/3}'

在这里插入图片描述

按列拆分：awk -F- 'BEGIN{print "ip"}{print $1}END{print "end of file"}' log.txt

$ cat log.txt 
192.168.1.20 - - [21/Apr/2020:14:27:49 +0800] "GET /1/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.21 - - [21/Apr/2020:15:27:49 +0800] "GET /2/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.22 - - [21/Apr/2020:21:27:49 +0800] "GET /3/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.23 - - [21/Apr/2020:22:27:49 +0800] "GET /1/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.24 - - [22/Apr/2020:15:27:49 +0800] "GET /2/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.25 - - [22/Apr/2020:15:26:49 +0800] "GET /3/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.20 - - [23/Apr/2020:08:27:49 +0800] "GET /1/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.21 - - [23/Apr/2020:09:20:49 +0800] "GET /1/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.22 - - [23/Apr/2020:10:27:49 +0800] "GET /1/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.22 - - [23/Apr/2020:10:27:49 +0800] "GET /1/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.20 - - [23/Apr/2020:14:27:49 +0800] "GET /1/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.21 - - [23/Apr/2020:15:27:49 +0800] "GET /2/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.22 - - [23/Apr/2020:15:27:49 +0800] "GET /3/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.25 - - [23/Apr/2020:16:27:49 +0800] "GET /1/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.24 - - [23/Apr/2020:20:27:49 +0800] "GET /2/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.25 - - [23/Apr/2020:20:27:49 +0800] "GET /3/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.20 - - [23/Apr/2020:20:27:49 +0800] "GET /1/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.21 - - [23/Apr/2020:20:27:49 +0800] "GET /1/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.22 - - [23/Apr/2020:20:27:49 +0800] "GET /1/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.22 - - [23/Apr/2020:22:27:49 +0800] "GET /1/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.21 - - [23/Apr/2020:23:27:49 +0800] "GET /1/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
$ awk -F- 'BEGIN{print "ip"}{print $1}END{print "end of file"}' log.txt 
ip
192.168.1.20 
192.168.1.21 
192.168.1.22 
192.168.1.23 
192.168.1.24 
192.168.1.25 
192.168.1.20 
192.168.1.21 
192.168.1.22 
192.168.1.22 
192.168.1.20 
192.168.1.21 
192.168.1.22 
192.168.1.25 
192.168.1.24 
192.168.1.25 
192.168.1.20 
192.168.1.21 
192.168.1.22 
192.168.1.22 
192.168.1.21 
end of file

但是awk默认是按空格拆分的，所以不加-F选项也是可以的：

awk 'BEGIN{print "ip"}{print $1}END{print "end of file"}' log.txt

明确概念：
行=记录=record
列=域=字段=filed

取行

NR: Number of Record
NR==1：第1行
NR>=1 && NR<=5：第1-5行
/oldboy/：包含’oldboy’的行
/102/,/105/：包含102到105的行

awk 'NR==1' oldboy.txt
awk 'NR>=1 && NR<=5' oldboy.txt
awk '/oldboy/' oldboy.txt

取列

-F分隔符：默认空格，多个空格或tab键
$n：取出第n列，ls -l | awk '{print $5}'
$0：表示整行的内容，ls -l | awk 'NR==3{print $0}'
column会按列对齐：ll | awk '{print $5,$7,$9}' | column -t

$ ll
total 20
drwxrwxr-x  2 daniel daniel 4096 Aug 18 19:12 ./
drwxr-xr-x 19 daniel daniel 4096 Aug 18 19:12 ../
-rw-rw-r--  1 daniel daniel    0 Aug 18 05:12 demo.sh
-rw-rw-r--  1 daniel daniel   76 Aug 18 06:17 id.txt
-rw-rw-r--  1 daniel daniel 3486 Aug 18 19:12 log.txt
-rw-rw-r--  1 daniel daniel  241 Aug 18 05:29 oldboy.txt

$ ll | awk '{print $5}'

4096
4096
0
76
3486
241

$ ll | awk 'NR==3{print $0}'
drwxr-xr-x 19 daniel daniel 4096 Aug 18 19:12 ../

$ ll | awk '{print $5,$7,$9}' | column -t
4096  18  ./
4096  18  ../
0     18  demo.sh
76    18  id.txt
3486  18  log.txt
241   18  oldboy.txt

awk的内置变量	含义
NR	行号
NF	每行有多少个字段（列数），则$NF表示最后一列
FS	Filed Separator，字段分隔符
OFS	Output Filed Separator，输出字段分隔符

awk -F: '{print $1" add_something "$NF}' /etc/passwd | column -t

令输出分隔符为,

awk -F: -vOFS=, '{print $NF,$2,$1}' info.txt
awk -F: '{print $1,"add_something",$NF}' /etc/passwd | column -t

取出网卡的ip地址：ip a s ens33 | awk 'NR==3' | awk -F"[ /]+" '{print $3}'

daniel@ubuntu:~/sh$ ip a s ens33
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:0c:29:56:dc:a5 brd ff:ff:ff:ff:ff:ff
    inet 192.168.37.139/24 brd 192.168.37.255 scope global dynamic noprefixroute ens33
       valid_lft 1416sec preferred_lft 1416sec
    inet6 fe80::a362:1f1f:25bf:bd2f/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
daniel@ubuntu:~/sh$ ip a s ens33 | awk 'NR==3' | awk -F"[ /]+" '{print $3}'
192.168.37.139

或者：ip a s ens33 | awk -F"[ /]+" 'NR==3 {print $3}'

模式匹配

匹配第3列中以1开头的行：awk -F: '$列数~/正则表达式条件/动作 source

awk -F: '$3~/^1/' /etc/passwd

为条件加上动作

awk -F: '$3~/^2/{print $NF,$2,$3}' /etc/passwd

awk -F: '$3~/^[12]/{print $1,$3,$NF}' /etc/passwd
awk -F: '$3~/^1|^2/{print $1,$3,$NF}' /etc/passwd | column -t
awk -F: '$3~/^(1|2)/{print $1,$3,$NF}' /etc/passwd | column -t

范围匹配：

/begin/,/end/
NR==3,NR==5

查询日志中的ip：

awk '/06:16:.* /,/06:16:.* / {print $1,$7}' access.log

BEGIN和END
在这里插入图片描述
求从1加到100

seq 100 | awk '{sum=sum+$1}END{print sum}'

数组

一般用于统计日志，如每个ip的出现次数或每个状态码出现的次数

$ awk 'BEGIN{a[0]="hello";a[1]="world";print a[0],a[1]}'
hello world

输出数组的全部内容：

#shell的做法
for i in ${array[*]}
do
	echo $i
done

#awk的做法
awk 'BEGIN{a[0]="hello";a[1]="world";for (i in a) print i,a[i]}'

统计次数：awk -F"[/.]+" '{cnt[$2]++}END{for(i in cnt) print i,cnt[i]}' url.txt

daniel@ubuntu:~/sh$ cat url.txt 
http://www.etiantian.org/index.html
http://www.etiantian.org/1.html
http://post.etiantian.org/index.html
http://mp3.etiantian.org/index.html
http://www.etiantian.org/3.html
http://post.etiantian.org/2.html
daniel@ubuntu:~/sh$ awk -F"[/.]+" '{cnt[$2]++}END{for(i in cnt) print i,cnt[i]}' url.txt 
www 3
post 2
mp3 1

根据第2列的数字逆向排序：

daniel@ubuntu:~/sh$ awk -F"[/.]+" '{cnt[$2]++}END{for(i in cnt) print i,cnt[i]}' url.txt | sort -rnk2
www 3
post 2
mp3 1

控制流

for循环

#shell中的for循环
for((i=0;i<10;++i))
do
	echo $i
done

#awk中的for循环
awk 'BEGIN{for(i=0;i<10;++i) print i}'
awk 'BEGIN{for(i=1;i<=100;++i) sum+=i;print sum}'
#print也参与循环
awk 'BEGIN{for(i=1;i<=100;++i) {sum+=i;print sum}}'

#shell中的if
if [ age -eq 18 ]
then
	echo "hello,man"
fi
#awk中的if
seq 100 | awk '{if($0%5==0) print "yes";else print "no"}'
awk 'BEGIN{for(i=0;i<100;++i){ if(i%5==0) print "yes";else print "no";}}'

例题：统计字符数小于6的单词：

echo I am oldboy teacher welcome to oldboy training class
awk -F"[ .]" '{for(i=1;i<=NF;++i) if(length($i)<6) print $i;}'

Linux三剑客：grep、sed与awk入门

正则表达式

^开头

$结尾

^$空行

.任意一个字符

\转义字符

*前一个字符连续出现0次及以上

.*表示所有

[abc]匹配abc中的任何一个

[^abc]排除（否定）abc