awk命令详解

最新推荐文章于 2024-06-07 12:37:33 发布

兔子王cool

最新推荐文章于 2024-06-07 12:37:33 发布

阅读量2.3w

点赞数 13

分类专栏： Linux基础文章标签： bash linux 开发语言

本文链接：https://blog.csdn.net/pokes/article/details/122691171

版权

Linux基础专栏收录该内容

8 篇文章 2 订阅

订阅专栏

awk命令

参考资料：https://blog.csdn.net/u010502101/article/details/81839519

AWK，数据过滤工具 (类似于grep，比grep强大)，属数据处理引擎，基于模式匹配检查输入文本，逐行处理并输出。通常用在Shell脚本中，获取指定的数据，单独使用时，可对文本数据做统计

格式

格式1：前置命令 | awk [选项] ‘条件{编辑指令}’

格式2：awk [选项] ‘条件{编辑指令}’ 文件…

编辑指令如果包含多条语句时，可以用分号分隔，处理文本时，若未指定分隔符，则默认将空格、制表符等作为分隔符。print是最常见的指令。

选项

-F：指定分隔符，可省略（默认空格或Tab位）

-V：调用外部Shell变量 variable

1.数据字段变量

awk把分割后的数据字段自动分配给数据字段变量

$0表示整行文本
$1表示文本行中第一个数据字段
$2表示文本行中第二个数据字段
$n表示文本行中第n个数据字段

[root@xuniji01 ~]# cat test.txt 
The dog:There is a big dog and a little dog in the park
The cat:There is a big cat and a little cat in the park
The tiger:There is a big tiger and a litle tiger in the park

通过选项-F指定“:”为字段分隔符，把每行数据分为两段，然后输出第二个数据字段$2。

[root@xuniji01 ~]# awk -F: '{print $2}' test.txt  
There is a big dog and a little dog in the park
There is a big cat and a little cat in the park
There is a big tiger and a litle tiger in the park

注意：如不显示指定字段分隔符，awk的默认字段分隔符为任意空白字符，包括制表符、空格符、换行符等。

2.在脚本中使用多个命令

[root@xuniji01 ~]# awk -F: '{$1="Description:"; print $0}' test.txt 
Description: There is a big dog and a little dog in the park
Description: There is a big cat and a little cat in the park
Description: There is a big tiger and a litle tiger in the park

3.从文件中读程序命令

[root@xuniji01 ~]# vim pokes.txt
{
$1="Description:"
print $0
}

[root@xuniji01 ~]# awk -F: -f pokes.txt test.txt 
Description: There is a big dog and a little dog in the park
Description: There is a big cat and a little cat in the park
Description: There is a big tiger and a litle tiger in the park

4.在处理数据之前运行脚本

awk默认每次读入一行数据，然后用脚本进行处理。如果想在处理文本之前预处理一些命令，可以用BEGIN关键字指定。

[root@xuniji01 ~]# awk -F: 'BEGIN{print "开始处理..."}{print $2}' test.txt 
开始处理...
There is a big dog and a little dog in the park
There is a big cat and a little cat in the park
There is a big tiger and a litle tiger in the park

5.在处理数据后运行脚本

用END关键字在处理完所有数据后，再运行善后处理工作。

[root@xuniji01 ~]# awk -F: '{print $2} END{print "处理结束..."}' test.txt 
There is a big dog and a little dog in the park
There is a big cat and a little cat in the park
There is a big tiger and a litle tiger in the park
处理结束...

6.在program中使用变量

变量又分为两种形式：awk内置的变量；用户自定义的变量。

内置变量

与记录分隔符相关变量

FS ：输入字段分隔符
OFS：输出字段分隔符
RS：输入记录分割符
ORS：输出字段分隔符
FIELDWIDTHS：定义数据字段的宽度

FS用法

[root@xuniji01 ~]# cat test.txt 
The dog:There is a big dog and a little dog in the park
The cat:There is a big cat and a little cat in the park
The tiger:There is a big tiger and a litle tiger in the park

[root@xuniji01 ~]# awk 'BEGIN{FS=":"} {print $1, $2}' test.txt    #用FS指定字段分隔符为“:”，然后用“:”把每行数据分割为两段。
The dog There is a big dog and a little dog in the park
The cat There is a big cat and a little cat in the park
The tiger There is a big tiger and a litle tiger in the park

OFS用法

用FS指定输入字段分隔符“:”后，每行数据分为两个数据段，输出时，用OFS指定两个数据字段用“>”拼接。

[root@xuniji01 ~]# cat test.txt 
The dog:There is a big dog and a little dog in the park
The cat:There is a big cat and a little cat in the park
The tiger:There is a big tiger and a litle tiger in the park

[root@xuniji01 ~]# awk 'BEGIN{FS=":"; OFS=">"} {print $1, $2}' test.txt   #其实就是，FS指定字段分隔符为“:”，然后将指定的分隔符替换为>
The dog>There is a big dog and a little dog in the park
The cat>There is a big cat and a little cat in the park
The tiger>There is a big tiger and a litle tiger in the park

RS和ORS用法

默认情况下RS和ORS设置为“\n”，表示输入数据流中的每一行作为一条记录，输出时每条记录之间也以“\n”进行分割。
下面以a.txt文件为例，a.txt文件中内容如下：

[root@xuniji01 ~]# cat a.txt 
Tom is a student
and he is 20 years old

Bob is a teacher
and he is 40 years old


默认情况下，每行作为一条记录处理，但此种情况下，要把第一行和第二行作为一条记录处理，第三行和第四行作为一条记录处理。

[root@xuniji01 ~]# awk 'BEGIN{RS=""; ORS="\n"; FS="and"; OFS=","} {print $1, $2}' a.txt  
Tom is a student
, he is 20 years old
Bob is a teacher
, he is 40 years old

\n把前两行、后两行各看作一条记录来处理，然后把指定分隔符and替换为逗号。

常见字符截取

提取IP地址

##先查出IP地址
[root@xuniji01 ~]# ip add
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:15:5d:0b:f5:03 brd ff:ff:ff:ff:ff:ff
    inet 10.5.6.244/24 brd 10.5.6.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::b268:c536:3f01:4c85/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

##先把这一行拉出来
[root@xuniji01 ~]# ip add | grep global
    inet 10.5.6.244/24 brd 10.5.6.255 scope global noprefixroute eth0
    
##再把以空格为分隔的第二列拉出来
[root@xuniji01 ~]# ip add | grep global | awk '{print $2}'
10.5.6.244/24

##然后把以/24为分隔符的第一列拉出来
[root@xuniji01 ~]# ip add | grep global | awk '{print $2}' | awk -F/24 '{print $1}'   # 以/24为分隔符的第一列
10.5.6.244

这样就OK了。

##如果需要提取广播，提取第四列

[root@xuniji01 ~]# ip add | grep global | awk '{print $4}'
10.5.6.255

提取home目录可用容量

[root@xuniji01 ~]# df -h
文件系统                 容量  已用  可用 已用% 挂载点
devtmpfs                 904M     0  904M    0% /dev
tmpfs                    915M     0  915M    0% /dev/shm
tmpfs                    915M  8.5M  907M    1% /run
tmpfs                    915M     0  915M    0% /sys/fs/cgroup
/dev/mapper/centos-root   50G  2.0G   48G    4% /
/dev/sda1               1014M  180M  835M   18% /boot
/dev/mapper/centos-home   74G   33M   74G    1% /home
tmpfs                    183M     0  183M    0% /run/user/0

[root@xuniji01 ~]# df -h | grep home
/dev/mapper/centos-home   74G   33M   74G    1% /home

[root@xuniji01 ~]# df -h | grep home |  awk '{print $4}'
74G