grep用于在文件中查找:
$ grep 'ui' id.txt
oldboy 57089234758ui
#语法:grep [-nviwcE] "RE" (from) somewhere
参数:
参数 | 含义 |
---|---|
-n | –line-number |
-v | –invert-match |
-i | –ignore-case |
-w | –word-regexp:只显示全字符合的列 |
-c | –count |
-E | –extended-regexp:扩展正则 |
正则表达式
^
开头
^x
查询以x开头的行:grep '^my' oldboy.txt
$ cat -n oldboy.txt
1 I am oldboy teacher!
2 I teach linux.
3
4 I like badminton ball ,billiard ball and chinese chess!
5 my blog is http://oblboy.blog.51cto.com
6 our size is http://blog.oldboyedu.com
7 my qq is 49000448
8
9 not 4900000448
10 my god ,i am not oldbey,but OLDBOY!
$ grep -n '^my' oldboy.txt
5:my blog is http://oblboy.blog.51cto.com
7:my qq is 49000448
10:my god ,i am not oldbey,but OLDBOY!
oldboy.txt:
I am oldboy teacher!
I teach linux.
I like badminton ball ,billiard ball and chinese chess!
my blog is http://oblboy.blog.51cto.com
our size is http://blog.oldboyedu.com
my qq is 49000448
not 4900000448
my god ,i am not oldbey,but OLDBOY!
$
结尾
x$
查询以x结尾的行:grep '448$' oldboy.txt
$ grep '448$' oldboy.txt
my qq is 49000448
not 4900000448
cat -A
显示文件中的空行和空格:
$ cat -A oldboy.txt
I am oldboy teacher!$
I teach linux.$
$
I like badminton ball ,billiard ball and chinese chess!$
my blog is http://oblboy.blog.51cto.com $
our size is http://blog.oldboyedu.com $
my qq is 49000448$
$
not 4900000448$
my god ,i am not oldbey,but OLDBOY!$
^$
空行
^$
$ grep -n '^$' oldboy.txt
3:
8:
排除空行:grep -v '^$' oldboy.txt
.
任意一个字符
但是不匹配空行
$ grep '.' oldboy.txt
I am oldboy teacher!
I teach linux.
I like badminton ball ,billiard ball and chinese chess!
my blog is http://oblboy.blog.51cto.com
our size is http://blog.oldboyedu.com
my qq is 49000448
not 4900000448
my god ,i am not oldbey,but OLDBOY!
\
转义字符
$ grep '\.$' oldboy.txt
I teach linux.
*
前一个字符连续出现0次及以上
grep '0*' oldboy.txt
,数字0出现0次及以上
.*
表示所有
grep '.*' oldboy.txt
对于每一行查找到字母t:grep '^.*t' oldboy.txt
正则贪婪性,会一直匹配到最后一个字符:
[abc]
匹配abc中的任何一个
grep '[abc]' oldboy.txt
匹配小写字母:grep '[a-z]' oldboy.txt
匹配大写字母:grep '[A-Z]' oldboy.txt
匹配数字:grep '[0-9]' oldboy.txt
匹配所有大小写字母和数字:
grep '[a-zA-Z0-9]' oldboy.txt
grep '[a-Z0-9]' oldboy.txt
grep -i '[a-z0-9]' oldboy.txt #-i表示忽略大小写
注意这里的“或”逻辑的表达是直接将模式串拼在一起,不需要用显示地指出
[^abc]
排除(否定)abc
grep '[^abc]' oldboy.txt
不以mn开头的行:grep '^[^mn]' oldboy.txt
不以数字结尾的行:grep '[^0-9]$' oldboy.txt
基础正则总结
基础正则 | 含义 |
---|---|
^ | 以……开头 |
$ | 以……结尾 |
^$ | 空行 |
. | 任意一个字符 |
* | 前一个字符连续出现0次及以上 |
.* | 所有内容,不包括空行 |
\ | 转义字符 |
[] | 取其中的一个 |
[^] | 取反排除 |
扩展正则:
+前一个字符出现1次及以上
如何让grep支持扩展正则:grep -E '[0+] oldboy.txt
匹配文件中所有的数字或单词:
grep -E '[0-9]+' oldboy.txt
grep -E '[a-z]+' oldboy.txt
|
或者
grep -E 'oldboy|oldbey' oldboy.txt
$ grep -E 'oldboy|oldbey' oldboy.txt
I am oldboy teacher!
our size is http://blog.oldboyedu.com
my god ,i am not oldbey,but OLDBOY!
与[]
的区别:[]
一次只挑选出一个字符,而|
从条件中挑选出一个单词
()
表示一个字符整体
grep -E 'oldb(o|e)y' oldboy.txt
$ grep -E 'oldboy|oldbey' oldboy.txt
I am oldboy teacher!
our size is http://blog.oldboyedu.com
my god ,i am not oldbey,but OLDBOY!
$ grep "oldb[oe]y" oldboy.txt
I am oldboy teacher!
our size is http://blog.oldboyedu.com
my god ,i am not oldbey,but OLDBOY!
$ grep -E 'oldb(o|e)y' oldboy.txt
I am oldboy teacher!
out size is http://blog.oldboyedu.com
my god, i am not oldbey, but OLDBOY!
x{n,m}
前一个字符x出现n到m次
grep -E '0{3,4}' oldboy.txt
,表示0至少出现3次,至多出现4次
grep -E '0{3}' oldboy.txt
,表示0正好出现3次
grep -E '0{3,}' oldboy.txt
,表示0至少出现3次
grep -E '0{,3}' oldboy.txt
,表示0正好至多3次
?
前一个字符出现0或1次
egrep 'go?d' oldboy.txt
$ grep -E 'o?bl' oldboy.txt
my blog is http://oblboy.blog.51cto.com
our size is http://blog.oldboyedu.com
扩展正则总结
符号 | 含义 |
---|---|
+ | 前一个字符连续出现1次或1次及以上 |
| | 或者 |
() | 作为一个整体;sed中作为捕获组 |
{n,m} | 至少连续出现n次,最多连续出现m次 |
? | 前一个字符出现0次或1次 |
匹配身份证号:
grep -E '[0-9]{17}[0-9X]' id.txt
排除空行或者含有#的行
grep -E -v '^$|#' file.txt
三剑客:
命令 | 场景 |
---|---|
grep | 过滤 |
sed | 替换;修改文件内容;取行 |
awk | 取列;统计计算 |
grep
参数 | 含义 |
---|---|
-A | after |
-B | before |
-C | contex |
-c | count |
-v | invert |
-n | line number |
-i | ignore |
-w | word-regexp精确匹配 |
$ seq 10 | grep -A5 2
2
3
4
5
6
7
$ seq 10 | grep -B3 9
6
7
8
9
$ seq 10 | grep -C2 5
3
4
5
6
7
$ grep -A3 '^our' oldboy.txt
our size is http://blog.oldboyedu.com
my qq is 49000448
not 4900000448
-c
统计行数
$ ps -ef | grep sshd | wc -l
2
$ ps -ef | grep -c sshd
2
-v可以排除,可以用于排除ps结果中的grep进程:ps -ef | grep cround | grep -v grep
$ ps -ef | grep fuck | grep -vc 'grep'
0
-w用于精确匹配
$ grep -w 22 /etc/services
ssh 22/tcp # SSH Remote Login Protocol
sed
stream editor
查找p
-n:仅显示script处理后的结果
格式 | 含义 | 实例 |
---|---|---|
‘3p’ | 第3行 | sed -n '3p' oldboy.txt |
‘4,7p’ | 第4到7行 | sed -n '4,7p' oldboy.txt |
‘4,$p’ | 第4到末尾 | sed -n '4,$p' oldboy.txt |
‘$p’ | 到末尾 | sed -n '$p' oldboy.txt |
‘/[RE]/p’ | /regular expression/正则匹配 | sed -n '/[45]/p' oldboy.txt sed -n '/oldboy/p' oldboy.txt |
‘/102/,/105/p’ | 从包含102的行到包含105的行,表示范围 | sed -n '/11:02:00/,/11:03:01/p' oldboy.txt |
删除d
-r:支持扩展正则;查找的过程同上
删除空行或包含#的行:sed -r '/^$|#/d' oldboy.txt
或者用!
取反的p
:sed -nr '/^$|#/!p' oldboy.txt
$ sed -r '/^$|#/d' oldboy.txt
I am oldboy teacher!
I teach linux.
I like badminton ball, billiard ball and chinese chess!
my blog is http://odlboy.blog.51cto.com
out size is http://blog.oldboyedu.com
my qq is 49000448
not 490000448.
my god, i am not oldbey, but OLDBOY!
$ sed -nr '/^$|#/!p' oldboy.txt
增加cai
在第3行的下一行添加:sed '3a 996daniel,hello' oldboy.txt
在第3行的上一行添加:sed '3i 996daniel,hello' oldboy.txt
替换掉第3行:sed '3c 996daniel,hello' oldboy.txt
在结尾添加:sed '$a hello,daniel' oldboy.txt
替换s(substitute)
sed 's/a/b/g' file.txt
sed 's/[0-9]//g' oldboy.txt #substitute numbers to emtpy in global
sed 's/[0-9]/*/g' oldboy.txt #substitute numbers to * in global
后向引用和反向引用
先保护起来()
,再使用\n
引用
在123456左右加上<>:
$ echo 12345 | sed -r 's/([0-9]*)/<\1>/'
<12345>
$ echo 12345 | sed -r 's/(.*)/<\1>/'
<12345>
交换前后位置:sed 's#^(.*)_(.*)$#\2_\1#'
$ echo fuck_you | sed -r "s/(.*)_(.*)/\2_\1/"
you_fuck
#or
$ echo fuck_you | sed -r "s/^([a-z]+)_([a-z]+)/\2_\1/"
you_fuck
$ sed -n '9p' oldboy.txt | sed -r 's/([a-z]+) ([0-9]+)/\2 \1/'
4900000448 not
显示网卡的ip信息:
$ ifconfig ens33
ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.163.130 netmask 255.255.255.0 broadcast 192.168.163.255
inet6 fe80::bd7f:f48:2f7a:a466 prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:96:18:65 txqueuelen 1000 (Ethernet)
RX packets 557 bytes 51894 (51.8 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 69 bytes 7288 (7.2 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
$ ifconfig | sed -n '2p' | sed -r 's/.*inet (.*) netmask.*$/\1/'
192.168.163.130
awk
执行过程
awk 'BEGIN{print 1/3}'
按列拆分:awk -F- 'BEGIN{print "ip"}{print $1}END{print "end of file"}' log.txt
$ cat log.txt
192.168.1.20 - - [21/Apr/2020:14:27:49 +0800] "GET /1/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.21 - - [21/Apr/2020:15:27:49 +0800] "GET /2/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.22 - - [21/Apr/2020:21:27:49 +0800] "GET /3/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.23 - - [21/Apr/2020:22:27:49 +0800] "GET /1/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.24 - - [22/Apr/2020:15:27:49 +0800] "GET /2/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.25 - - [22/Apr/2020:15:26:49 +0800] "GET /3/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.20 - - [23/Apr/2020:08:27:49 +0800] "GET /1/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.21 - - [23/Apr/2020:09:20:49 +0800] "GET /1/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.22 - - [23/Apr/2020:10:27:49 +0800] "GET /1/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.22 - - [23/Apr/2020:10:27:49 +0800] "GET /1/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.20 - - [23/Apr/2020:14:27:49 +0800] "GET /1/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.21 - - [23/Apr/2020:15:27:49 +0800] "GET /2/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.22 - - [23/Apr/2020:15:27:49 +0800] "GET /3/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.25 - - [23/Apr/2020:16:27:49 +0800] "GET /1/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.24 - - [23/Apr/2020:20:27:49 +0800] "GET /2/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.25 - - [23/Apr/2020:20:27:49 +0800] "GET /3/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.20 - - [23/Apr/2020:20:27:49 +0800] "GET /1/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.21 - - [23/Apr/2020:20:27:49 +0800] "GET /1/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.22 - - [23/Apr/2020:20:27:49 +0800] "GET /1/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.22 - - [23/Apr/2020:22:27:49 +0800] "GET /1/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
192.168.1.21 - - [23/Apr/2020:23:27:49 +0800] "GET /1/index.php HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
$ awk -F- 'BEGIN{print "ip"}{print $1}END{print "end of file"}' log.txt
ip
192.168.1.20
192.168.1.21
192.168.1.22
192.168.1.23
192.168.1.24
192.168.1.25
192.168.1.20
192.168.1.21
192.168.1.22
192.168.1.22
192.168.1.20
192.168.1.21
192.168.1.22
192.168.1.25
192.168.1.24
192.168.1.25
192.168.1.20
192.168.1.21
192.168.1.22
192.168.1.22
192.168.1.21
end of file
但是awk默认是按空格拆分的,所以不加-F选项也是可以的:
awk 'BEGIN{print "ip"}{print $1}END{print "end of file"}' log.txt
明确概念:
行=记录=record
列=域=字段=filed
取行
NR: Number of Record
NR==1
:第1行
NR>=1 && NR<=5
:第1-5行
/oldboy/
:包含’oldboy’的行
/102/,/105/
:包含102到105的行
awk 'NR==1' oldboy.txt
awk 'NR>=1 && NR<=5' oldboy.txt
awk '/oldboy/' oldboy.txt
取列
-F
分隔符:默认空格,多个空格或tab键
$n
:取出第n列,ls -l | awk '{print $5}'
$0
:表示整行的内容,ls -l | awk 'NR==3{print $0}'
column会按列对齐:ll | awk '{print $5,$7,$9}' | column -t
$ ll
total 20
drwxrwxr-x 2 daniel daniel 4096 Aug 18 19:12 ./
drwxr-xr-x 19 daniel daniel 4096 Aug 18 19:12 ../
-rw-rw-r-- 1 daniel daniel 0 Aug 18 05:12 demo.sh
-rw-rw-r-- 1 daniel daniel 76 Aug 18 06:17 id.txt
-rw-rw-r-- 1 daniel daniel 3486 Aug 18 19:12 log.txt
-rw-rw-r-- 1 daniel daniel 241 Aug 18 05:29 oldboy.txt
$ ll | awk '{print $5}'
4096
4096
0
76
3486
241
$ ll | awk 'NR==3{print $0}'
drwxr-xr-x 19 daniel daniel 4096 Aug 18 19:12 ../
$ ll | awk '{print $5,$7,$9}' | column -t
4096 18 ./
4096 18 ../
0 18 demo.sh
76 18 id.txt
3486 18 log.txt
241 18 oldboy.txt
awk的内置变量 | 含义 |
---|---|
NR | 行号 |
NF | 每行有多少个字段(列数),则$NF表示最后一列 |
FS | Filed Separator,字段分隔符 |
OFS | Output Filed Separator,输出字段分隔符 |
awk -F: '{print $1" add_something "$NF}' /etc/passwd | column -t
令输出分隔符为,
awk -F: -vOFS=, '{print $NF,$2,$1}' info.txt
awk -F: '{print $1,"add_something",$NF}' /etc/passwd | column -t
取出网卡的ip地址:ip a s ens33 | awk 'NR==3' | awk -F"[ /]+" '{print $3}'
daniel@ubuntu:~/sh$ ip a s ens33
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 00:0c:29:56:dc:a5 brd ff:ff:ff:ff:ff:ff
inet 192.168.37.139/24 brd 192.168.37.255 scope global dynamic noprefixroute ens33
valid_lft 1416sec preferred_lft 1416sec
inet6 fe80::a362:1f1f:25bf:bd2f/64 scope link noprefixroute
valid_lft forever preferred_lft forever
daniel@ubuntu:~/sh$ ip a s ens33 | awk 'NR==3' | awk -F"[ /]+" '{print $3}'
192.168.37.139
或者:ip a s ens33 | awk -F"[ /]+" 'NR==3 {print $3}'
模式匹配
匹配第3列中以1开头的行:awk -F: '$列数~/正则表达式条件/动作 source
awk -F: '$3~/^1/' /etc/passwd
为条件加上动作
awk -F: '$3~/^2/{print $NF,$2,$3}' /etc/passwd
awk -F: '$3~/^[12]/{print $1,$3,$NF}' /etc/passwd
awk -F: '$3~/^1|^2/{print $1,$3,$NF}' /etc/passwd | column -t
awk -F: '$3~/^(1|2)/{print $1,$3,$NF}' /etc/passwd | column -t
范围匹配:
/begin/,/end/
NR==3,NR==5
查询日志中的ip:
awk '/06:16:.* /,/06:16:.* / {print $1,$7}' access.log
BEGIN和END
求从1加到100
seq 100 | awk '{sum=sum+$1}END{print sum}'
数组
一般用于统计日志,如每个ip的出现次数或每个状态码出现的次数
$ awk 'BEGIN{a[0]="hello";a[1]="world";print a[0],a[1]}'
hello world
输出数组的全部内容:
#shell的做法
for i in ${array[*]}
do
echo $i
done
#awk的做法
awk 'BEGIN{a[0]="hello";a[1]="world";for (i in a) print i,a[i]}'
统计次数:awk -F"[/.]+" '{cnt[$2]++}END{for(i in cnt) print i,cnt[i]}' url.txt
daniel@ubuntu:~/sh$ cat url.txt
http://www.etiantian.org/index.html
http://www.etiantian.org/1.html
http://post.etiantian.org/index.html
http://mp3.etiantian.org/index.html
http://www.etiantian.org/3.html
http://post.etiantian.org/2.html
daniel@ubuntu:~/sh$ awk -F"[/.]+" '{cnt[$2]++}END{for(i in cnt) print i,cnt[i]}' url.txt
www 3
post 2
mp3 1
根据第2列的数字逆向排序:
daniel@ubuntu:~/sh$ awk -F"[/.]+" '{cnt[$2]++}END{for(i in cnt) print i,cnt[i]}' url.txt | sort -rnk2
www 3
post 2
mp3 1
控制流
for循环
#shell中的for循环
for((i=0;i<10;++i))
do
echo $i
done
#awk中的for循环
awk 'BEGIN{for(i=0;i<10;++i) print i}'
awk 'BEGIN{for(i=1;i<=100;++i) sum+=i;print sum}'
#print也参与循环
awk 'BEGIN{for(i=1;i<=100;++i) {sum+=i;print sum}}'
if
#shell中的if
if [ age -eq 18 ]
then
echo "hello,man"
fi
#awk中的if
seq 100 | awk '{if($0%5==0) print "yes";else print "no"}'
awk 'BEGIN{for(i=0;i<100;++i){ if(i%5==0) print "yes";else print "no";}}'
例题:统计字符数小于6的单词:
echo I am oldboy teacher welcome to oldboy training class
awk -F"[ .]" '{for(i=1;i<=NF;++i) if(length($i)<6) print $i;}'