Linux文本操作指令grep、sed和awk

最新推荐文章于 2023-02-17 21:37:14 发布

ycl686

最新推荐文章于 2023-02-17 21:37:14 发布

阅读量260

点赞数

分类专栏： Linux 文章标签： Linux 文本处理 sed grep awk

本文链接：https://blog.csdn.net/qq_29688997/article/details/82020185

版权

Linux 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

最近还是沉迷Linux shell无法自拔，其中的文本指令如grep、sed和awk指令还是有丶东西的。故来总结归纳一番。

1.grep

首先是grep，其功能是在文件内查找指定的字符串，如果发现在文本中找到了指定的字符串，预设grep指令会把含有该字符串的那一行显示出来，且该字符串高亮。
grep语法:

grep [-abcEFGhHilLnqrsvVwxy][-A<显示列数>][-B<显示列数>][-C<显示列数>]
[-d<进行动作>][-e<范本样式>][-f<范本文件>][--help][范本样式][文件或目录...]

参数貌似有点多，来个最简单的格式grep 字符串文件

[root@localhost ~]# cat /etc/test.txt #测试文件内容
Hi,what's your name?
I'm very happy to meet you. 
How do you do?
I'm fine,thanks.
Would you like to drink sth?
Myname is Linux.
I wanna drink some coffee.
It's a nice day,isn't it?
Where are you from?

[root@localhost ~]# grep you /etc/test.txt 
Hi,what's your name?
I'm very happy to meet you. 
How do you do?
Would you like to drink sth?
Where are you from?
#将显示test.txt文件中所有含you的行，粘贴过来高亮貌似不能显示，见谅。

这时候我们试试加点参数，比如-r（recusion），递归查找，在etc/sysconfig目录下递归查找含有update字符串的文件行，此时不止输出含有update的行，还会输出该文件名。

[root@localhost sysconfig]# grep -r update /etc/sysconfig 
/etc/sysconfig/network-scripts/ifup-TeamPort:       /usr/bin/teamdctl ${TEAM_MASTER} port config update ${DEVICE} "${TEAM_PORT_CONFIG}" || exit 1
/etc/sysconfig/network-scripts/ifdown-post:update_DNS_entries
/etc/sysconfig/network-scripts/ifup-aliases:# addrs will be updated on existing aliases, and new aliases will be setup.
/etc/sysconfig/network-scripts/ifup-aliases:        # update ARP cache of neighboring computers:
/etc/sysconfig/network-scripts/ifup-eth:            # update ARP cache of neighboring computers
/etc/sysconfig/network-scripts/ifup-eth:            if ! is_false "${arpupdate[$idx]}" && [ "${REALDEVICE}" != "lo" ]; then
/etc/sysconfig/network-scripts/ifup-post:    update_DNS_entries

试试-v参数，反向查找，查找不含有you*的表达式，上篇文章刚刚介绍了正则表达式，此处表示查找不含有以you开头的字符串，在上述test.txt中显然会过滤到含有you和your的行。

[root@localhost /]# grep -v you* /etc/test.txt 
I'm fine,thanks.
Myname is Linux.
I wanna drink some coffee.
It's a nice day,isn't it?

2.sed

其次我们介绍sed指令，sed命令是利用script来处理文本文件。sed可依照script的指令，来处理、编辑文本文件。Sed主要用来自动编辑一个或多个文件；简化对文件的反复操作；编写转换程序等。
sed语法：

sed [-hnV][-e<script>][-f<script文件>][文本文件]
#这个参数不多，就简单介绍一下
#-h 不用多说，I need some help
#-n 仅显示script处理后的结果。
#-V 当然是version啦
#-e 目测是execution，具体的script操作，如a（新增）、d（删除）、i（插入）、p（打印）、s（取代）等
#-f script file，我们可以将-e那些操作写进file中。

我们现在利用sed指令向test.txt最后一行添加一个字符串“I’m new here!”。

[root@localhost /]# sed -e 9a\'I'm new here!' /etc/test.txt #测试文件共9行，注意为什么是9而不是10 
Hi,what's your name?
I'm very happy to meet you. 
How do you do?
I'm fine,thanks.
Would you like to drink sth?
Myname is Linux.
I wanna drink some coffee.
It's a nice day,isn't it?
Where are you from?
I'm new here!#新插入的行
[root@localhost /]#cat /etc/test.txt #cat一下，似乎少了什么
Hi,what's your name?
I'm very happy to meet you. 
How do you do?
I'm fine,thanks.
Would you like to drink sth?
Myname is Linux.
I wanna drink some coffee.
It's a nice day,isn't it?
Where are you from?
[root@localhost /]#

细心一点的读者会发现此时再cat，test.txt文件并没有被修改，是的，因为sed是对流进行操作而不是对文件本身进行操作，当我们确认文件无误时可以覆盖源文件。

[root@localhost /]# sed -e 9a\'I'm new here!' /etc/test.txt > /etc/test.txt.tmp #导入到临时文件
[root@localhost /]# cat /etc/test.txt.tmp #确认临时文件无误
Hi,what's your name?
I'm very happy to meet you. 
How do you do?
I'm fine,thanks.
Would you like to drink sth?
Myname is Linux.
I wanna drink some coffee.
It's a nice day,isn't it?
Where are you from?
'Im new here!
[root@localhost /]# mv /etc/test.txt.tmp /etc/test.txt #覆盖源文件
mv：是否覆盖"/etc/test.txt"？ y
[root@localhost /]# cat /etc/test.txt
Hi,what's your name?
I'm very happy to meet you. 
How do you do?
I'm fine,thanks.
Would you like to drink sth?
Myname is Linux.
I wanna drink some coffee.
It's a nice day,isn't it?
Where are you from?
'Im new here!  #此时源文件test.txt就被修改了。
[root@localhost /]# 
#我们还有另外的方法，使用-i指令,当然不推荐。
[root@localhost /]# sed -i 9a\'I'm new here!' /etc/test.txt

用sed指令删除指定行。

nl /etc/test.txt | sed '2d' #只删除第二行
nl /etc/test.txt | sed '3,$d' #删除第三行到最后一行
#其中nl是另一个指令，主要功能是显示文件内容并标注行号。

用sed指令搜索含有指定字符串的行。

[root@localhost /]# nl /etc/test.txt | sed '/you/p'
     1  Hi,what's your name?
     1  Hi,what's your name?
     2  I'm very happy to meet you. 
     2  I'm very happy to meet you. 
     3  How do you do?
     3  How do you do?
     4  I'm fine,thanks.
     5  Would you like to drink sth?
     5  Would you like to drink sth?
     6  Myname is Linux.
     7  I wanna drink some coffee.
     8  It's a nice day,isn't it?
     9  Where are you from?
     9  Where are you from?
    10  'Im new here!

我们注意到含有you的行输出了两次，如果you找到，除了输出所有行，还会输出匹配行。使用-n的时候将只打印包含you的行。

[root@localhost /]# nl /etc/test.txt | sed -n '/you/p'
     1  Hi,what's your name?
     2  I'm very happy to meet you. 
     3  How do you do?
     5  Would you like to drink sth?
     9  Where are you from?

3.awk

AWK是一种处理文本文件的语言，是一个强大的文本分析工具。
awk语法;

awk [选项参数] 'script' var=value file(s)
awk [选项参数] -f scriptfile var=value file(s)

参数略多，先不列出，下面看实例。

[root@localhost /]# awk '{print $1,$4}' /etc/test.txt #打印该文件的第1,4列，默认采用空格或Tab划分列。
Hi,what's 
I'm to
How do?
I'm 
Would to
Myname 
I some
It's day,isn't
Where from?
I'm

指定分隔符-F，我们发现在测试文件中 ’ 挺多的，不妨以 ’ 划分。

[root@localhost /]# awk -F'  '{print $1,$4}' /etc/test.txt 
>
#然而并没有结果，联想到正则表达式的转义字符，我斗胆尝试一下

[root@localhost /]# awk -F\'  '{print $1,$4}' /etc/test.txt 
Hi,what 
I 
How do you do? 
I 
Would you like to drink sth? 
Myname is Linux. 
I wanna drink some coffee. 
It 
Where are you from? 
I

接下来介绍awk的变量，有丶东西。

awk -v  # 设置变量

为了方便，修改test.txt文件部分内容。

[root@localhost /]# cat /etc/test.txt
3 Hi,what's your name?
6 I'm very happy to meet you. 
How do you do?
9 I'm fine,thanks.
Would you like to drink sth?
0 Myname is Linux.
1 I wanna drink some coffee.
It's a nice day,isn't it?
5 Where are you from?
I'm new here!

希望大家对比以下两段代码。

rootocalhost /]# awk -va=1 '{print $1,$1+a}' /etc/test.txt
3 4
6 7
How 1
9 10
Would 1
0 1
1 2
It's 1
5 6
I'm 1

[root@localhost /]# awk -va=1 '{print $1,$(1+a)}' /etc/test.txt
3 Hi,what's
6 I'm
How do
9 I'm
Would you
0 Myname
1 I
It's a
5 Where
I'm new

第一段结果略奇怪，注意 $1+a，显然是划分后第一列的值加上a,a=1,所以数值开头的行很好理解，如果是以字符串开头的行，比如How，均按0处理，故这些行均为1。对于第二段，实质就是打印第一列和第二列。接下来来理解下面这段就不困难了吧。

[root@localhost /]# awk -va=1 -vb=s '{print $1,$1+a,$1b}' /etc/test.txt 
3 4 3s
6 7 6s
How 1 Hows
9 10 9s
Would 1 Woulds
0 1 0s
1 2 1s
It's 1 It'ss
5 6 5s
I'm 1 I'ms

过滤数值，比如过滤第一列大于3的行。

[root@localhost /]# awk '$1>3' /etc/test.txt 
6 I'm very happy to meet you. 
How do you do?
9 I'm fine,thanks.
Would you like to drink sth?
It's a nice day,isn't it?
5 Where are you from?
I'm new here!

此时大家可能会有疑惑，为什么字符串开头的行都没有过滤？因为思维定势，上面我们认为字符串默认为0，而在比较大小过程中，先尝试能否将字符串转换成数值，比如0123转换成123，而对于How显然没办法，所以此时比较的是它们的ASCII码，所以这些字符串并没有被过滤。
最后举几个awk的应用实例。
（1）打印9*9乘法表格

[root@localhost /]# seq 9 | sed 'H;g' | awk -v RS='' '{for(i=1;i<=NF;i++)printf("%dx%d=%d%s", i, NR, i*NR, i==NR?"\n":"\t")}'
1x1=1
1x2=2   2x2=4
1x3=3   2x3=6   3x3=9
1x4=4   2x4=8   3x4=12  4x4=16
1x5=5   2x5=10  3x5=15  4x5=20  5x5=25
1x6=6   2x6=12  3x6=18  4x6=24  5x6=30  6x6=36
1x7=7   2x7=14  3x7=21  4x7=28  5x7=35  6x7=42  7x7=49
1x8=8   2x8=16  3x8=24  4x8=32  5x8=40  6x8=48  7x8=56  8x8=64
1x9=9   2x9=18  3x9=27  4x9=36  5x9=45  6x9=54  7x9=63  8x9=72  9x9=81

（2）磁盘占用率过滤

[root@localhost ~]# df -h #查看文件系统
文件系统               容量   已用  可用   已用% 挂载点
/dev/mapper/cl-root   50G  2.0G   49G    4%  /
devtmpfs             7.8G     0  7.8G    0%  /dev
tmpfs                7.8G     0  7.8G    0%  /dev/shm
tmpfs                7.8G   12M  7.8G    1%  /run
tmpfs                7.8G     0  7.8G    0%  /sys/fs/cgroup
/dev/sda1           1014M  177M  838M   18%  /boot
/dev/mapper/cl-home  441G   33M  441G    1%  /home
tmpfs                1.6G     0  1.6G    0%  /run/user/0

比如我们想找出占用率大于3%的设备，数一数“已用%”在第五列，即$5，这时候发现后面有%，给出一种思路，按空格和%划分，那么第五列显然是纯数字了。

[root@localhost ~]# df -h | awk -F'[ %]+' '{if($5>3)print}'
文件系统             容量  已用  可用 已用% 挂载点
/dev/mapper/cl-root   50G  2.0G   49G    4% /
/dev/sda1           1014M  177M  838M   18% /boot