三剑客修炼小抄（一）： awk 从被废到大乘圆满飞升渡劫

小鲸鱼大梦想

已于 2023-02-04 23:58:12 修改

阅读量226

点赞数

分类专栏：系统运维文章标签： linux 运维云计算

于 2023-02-02 14:29:58 首次发布

本文链接：https://blog.csdn.net/whale0306/article/details/128849802

版权

系统运维专栏收录该内容

22 篇文章 1 订阅

订阅专栏

三剑客修炼小抄

grep 、sed、awk 被称为 linux 中的”三剑客”

grep 更适合单纯的查找或匹配文本
sed 更适合编辑匹配到的文本
awk 更适合格式化文本，对文本进行较复杂格式处理

小抄来源朱双印个人日志：https://www.zsythink.net/，感谢大佬带来的这么优秀的技术文章

awk 是个什么东西

这就是专业：awk是一个报告生成器，它拥有强大的文本格式化的能力。

Alfred Aho
Peter Weinberger
Brian Kernighan

就上面这几个人弄出来的，所以叫：awk

awk 其实是一门编程语言，它支持条件判断、数组、循环等功能。所以，我们也可以把 awk 理解成一个脚本语言解释器

awk 基础

请看这个基本语法：awk [options] ‘Pattern{Action}’ file

使用最简单的 action

下面这个比方就是将df的内容通过管道符传递给awk进行处理，执行的动作就是一个最基本的打印功能，省略了options和Pattern

[root@yunmx scripts]# df | awk '{print}'
Filesystem     1K-blocks     Used Available Use% Mounted on
devtmpfs         1928328        0   1928328   0% /dev
tmpfs            1939936       24   1939912   1% /dev/shm
tmpfs            1939936      764   1939172   1% /run
tmpfs            1939936        0   1939936   0% /sys/fs/cgroup
/dev/vda1       82437508 10174348  68807656  13% /
tmpfs             387988        0    387988   0% /run/user/0
overlay         82437508 10174348  68807656  13% /var/lib/docker/overlay2/6f07df83e16117bd1d2c527181fbe38c56b5bab9c30add69e0030b687b3ccd91/merged
overlay         82437508 10174348  68807656  13% /var/lib/docker/overlay2/2ff6bc7de4b9ce70e17ff56b4d3801837d27fd30470193842a1c513ee86af075/merged
tmpfs             387988        0    387988   0% /run/user/1001
[root@yunmx scripts]#

再看下方：只输出了df的第二列数据，$2表示当前行按照分隔符分割后的第二列，不指定分隔符的时候，默认使用空格作为分隔符，且会自动将连续的空格当作分隔符的，

[root@yunmx scripts]# df | awk '{print $2}'
1K-blocks
1928328
1939936
1939936
1939936
82437508
387988
82437508
82437508
387988
[root@yunmx scripts]#

所以：

awk 是逐行进行数据处理的
默认以“换行符”为标记，识别每一行，就是“回车换行”
可以指定分隔符和换行符的。
$0表示显示整行，$NF表示当前行分隔后的最后一列，

继续看几个比方：

$0表示显示整行

[root@yunmx scripts]# cat test.txt
111 #$% 11 # 1
222 #$% 22 # 2
333 #$% 33 # 3
444 #$% 44 # 4
555 #$% 55 # 5

[root@yunmx scripts]# awk '{print $0}' test.txt
111 #$% 11 # 1
222 #$% 22 # 2
333 #$% 33 # 3
444 #$% 44 # 4
555 #$% 55 # 5

[root@yunmx scripts]#

也可以一次性输出多列，每个列之间用逗号隔开，你也可以不用逗号，你可以试试

[root@yunmx scripts]# awk '{print $1,$2,$3}' test.txt
111 #$% 11
222 #$% 22
333 #$% 33
444 #$% 44
555 #$% 55

[root@yunmx scripts]#

也可以添加自己想要的字符串，一起输出，但是不要忘记加双引号

为什么我最后一行这个样子呢？

[root@yunmx scripts]# awk '{print $1,"好大个烟锅巴踩不媳嘛"$2,$3}' test.txt
111 好大个烟锅巴踩不媳嘛#$% 11
222 好大个烟锅巴踩不媳嘛#$% 22
333 好大个烟锅巴踩不媳嘛#$% 33
444 好大个烟锅巴踩不媳嘛#$% 44
555 好大个烟锅巴踩不媳嘛#$% 55
 好大个烟锅巴踩不媳嘛

awk 可以灵活的将我们指定的字符与每一列进行拼接，或者把指定的字符当做一个新列插入到原来的列中，也就是 awk 格式化文本能力的体现。

awk 模式

回顾一下 awk 的基础语法：awk [options] ‘Pattern{Action}’ file

上面的比方只是简单用到了最常用的一个 action：print

来瞅瞅 awk 中的Pattern

直接来 awk 中的特殊模式：BEGIN 和 END

这2个单词我认识：开始和结尾
很明显就是 awk 处理前要做的事情和 awk 处理后要做的事情了撒

来来来：

[root@yunmx scripts]# awk 'BEGIN{print "这是BEGIN"}' test.txt
这是BEGIN
[root@yunmx scripts]#

上述比方就是说 awk 处理前先打印一段字符串，后续就没啥子处理了，所以只打印了需要打印的东西

再来一个：

[root@yunmx scripts]# awk 'BEGIN{print "这是BEGIN"}{print $1}' test.txt
这是BEGIN
111
222
333
444
555
[root@yunmx scripts]#

这就配合我们需要操作的文件进行一个BEGIN了

所以：BEGIN 模式的作用就是，在开始逐行处理文本之前，先执行 BEGIN 模式所指定的动作

END 就不用多说了，看下面就知道了：

[root@yunmx scripts]# awk 'BEGIN{print "这是BEGIN"}{print $1}END{print "这是END"}' test.txt
这是BEGIN
111
222
333
444
555
这是END
[root@yunmx scripts]#

这就有点想表头/内容/表位了

awk 分隔符

分隔符也分2种：

输入分隔符（FS）：默认就是空格或者多个空格
输出分隔符（OFS）：默认也是空格

看下面的比方自己体会下：

[root@yunmx scripts]# cat test.txt
111 #$% 11 # 1
222 #$% 22 # 2
333 #$% 33 # 3
444 #$% 44 # 4
555 #$% 55 # 5
[root@yunmx scripts]# awk -v FS='#' -v OFS='这是输出分隔符' '{print $1,$2}' test.txt
111 这是输出分隔符$% 11
222 这是输出分隔符$% 22
333 这是输出分隔符$% 33
444 这是输出分隔符$% 44
555 这是输出分隔符$% 55
[root@yunmx scripts]#

awk 变量

内置变量

变量名称	含义
FS	输入字段分隔符，默认为空白字符
OFS	输出字段分隔符，默认为空白字符
RS	输入行记录分隔符(输入换行符)，指定输入时的换行符
ORS	输出行记录分隔符（输出换行符），输出时用指定符号代替换行符
NF	number of Field，当前行的字段的个数(即当前行被分割成了几列)，字段数量
NR	行号，当前处理的文本行的行号。
FNR	各文件分别计数的行号
FILENAME	当前文件名
ARGC	命令行参数的个数
ARGV	数组，保存的是命令行所给定的各参数

看第一个比方：内置变量不需要加$，除非你要用内置变量的值，下面就是打印当前处理的行号以及当前行的字段个数

修炼的小技巧：你可以通过打印每行的行号再加上内容来显示你要的东西，另类显示行号的方式，虽然不是很实用

[root@yunmx scripts]# awk '{print NR,NF}' test.txt
1 5
2 5
3 5
4 5
5 5
[root@yunmx scripts]# cat test.txt
111 #$% 11 # 1
222 #$% 22 # 2
333 #$% 33 # 3
444 #$% 44 # 4
555 #$% 55 # 5
[root@yunmx scripts]#

来试试处理多个文件的时候：awk 处理多个文件时，也是一个个文件逐行去处理的

可以看到，使用了FNR后，awk 会对每个文件的行数单独计数

[root@yunmx scripts]# awk '{print FNR,$0}' test.txt passwd
1 111 #$% 11 # 1
2 222 #$% 22 # 2
3 333 #$% 33 # 3
4 444 #$% 44 # 4
5 555 #$% 55 # 5
1 root:x:0:0:root:/root:/bin/bash
2 bin:x:1:1:bin:/bin:/sbin/nologin
3 daemon:x:2:2:daemon:/sbin:/sbin/nologin
4 adm:x:3:4:adm:/var/adm:/sbin/nologin
5 nginx:x:995:992:Nginx web server:/var/lib/nginx:/sbin/nologin
6 grafana:x:994:991:grafana user:/usr/share/grafana:/sbin/nologin
[root@yunmx scripts]#

RS和ORS：输入行分隔符和输出行分隔符，两者默认都是采用“回车换行符”

理解一下，我反正理解到了

##############对实例文件我们以333为输入行分隔符瞅瞅

[root@yunmx scripts]# awk -v RS='333' '{print $0}' test.txt
111 #$% 11 # 1
222 #$% 22 # 2

 #$% 33 # 3
444 #$% 44 # 4
555 #$% 55 # 5

[root@yunmx scripts]# awk -v RS='333' -v ORS='你瞅啥' '{print $0}' test.txt
111 #$% 11 # 1
222 #$% 22 # 2
你瞅啥 #$% 33 # 3
444 #$% 44 # 4
555 #$% 55 # 5
你瞅啥[root@yunmx scripts]#

FILENAME：用来显示文件名称，这个不用多说了撒

[root@yunmx scripts]# awk '{print FILENAME,$0}' test.txt passwd
test.txt 111 #$% 11 # 1
test.txt 222 #$% 22 # 2
test.txt 333 #$% 33 # 3
test.txt 444 #$% 44 # 4
test.txt 555 #$% 55 # 5
passwd root:x:0:0:root:/root:/bin/bash
passwd bin:x:1:1:bin:/bin:/sbin/nologin
passwd daemon:x:2:2:daemon:/sbin:/sbin/nologin
passwd adm:x:3:4:adm:/var/adm:/sbin/nologin
passwd nginx:x:995:992:Nginx web server:/var/lib/nginx:/sbin/nologin
passwd grafana:x:994:991:grafana user:/usr/share/grafana:/sbin/nologin
[root@yunmx scripts]#

ARGC和ARGV
- ARGV：内置变量表示的是一个数组，数组保存的是命令行所给定的参数，
  
  数组引用元素的值，而且我们发现ARGV[0]对应的是 awk 本身，对喽，就是这样规定的
- ARGC：就简单了，就是表示参数的数量，就是上面数组的长度

[root@yunmx scripts]# awk '{print $1}' test.txt
111
222
333
444
555
[root@yunmx scripts]# awk '{print $1,ARGV[0]}' test.txt
111 awk
222 awk
333 awk
444 awk
555 awk
[root@yunmx scripts]# awk '{print $1,ARGV[0],ARGV[1]}' test.txt
111 awk test.txt
222 awk test.txt
333 awk test.txt
444 awk test.txt
555 awk test.txt
[root@yunmx scripts]# awk '{print $1,ARGV[0],ARGV[1],ARGC}' test.txt
111 awk test.txt 2
222 awk test.txt 2
333 awk test.txt 2
444 awk test.txt 2
555 awk test.txt 2
[root@yunmx scripts]#

自定义变量

我们自己定义变量的方式：

-v varname=value 变量名区分字符大小写
在program中直接定义：变量定义与动作之间需要用分号”;”隔开

[root@yunmx scripts]# awk -v myname="whale" '{print $0,myname}' test.txt
111 #$% 11 # 1 whale
222 #$% 22 # 2 whale
333 #$% 33 # 3 whale
444 #$% 44 # 4 whale
555 #$% 55 # 5 whale
[root@yunmx scripts]#

[root@yunmx scripts]# awk 'BEGIN{myname="whale"}{print $0,myname}' test.txt
111 #$% 11 # 1 whale
222 #$% 22 # 2 whale
333 #$% 33 # 3 whale
444 #$% 44 # 4 whale
555 #$% 55 # 5 whale
[root@yunmx scripts]#

第一种方式的优势：可以引用 shell 中的变量

[root@yunmx scripts]# echo $HOME
/root
[root@yunmx scripts]# awk -v myname=$HOME '{print $0,myname}' test.txt
111 #$% 11 # 1 /root
222 #$% 22 # 2 /root
333 #$% 33 # 3 /root
444 #$% 44 # 4 /root
555 #$% 55 # 5 /root
[root@yunmx scripts]#

接着修炼吧，感觉已经度过了一次次小劫

awk 格式化

上面修炼中用的最多的就是 awk 中的print动作，这只是很简单的文本输出功能，不能对文本格式进行改变，要改变就得用另一个动作，这就是爱，不对，这就是printf，前不久我才学些了 C 基础，感觉对这个很在行，哈哈

基础

先简单知道一下printf的语法规则：

printf “指定的格式” “文本1” “文本2” “文本3” ……

常见的格式替换：

%s 字符串

%f 浮点格式（也就是我们概念中的float或者double）

%b 相对应的参数中包含转义字符时，可以使用此替换符进行替换，对应的转义字符会被转义。

%c ASCII字符。显示相对应参数的第一个字符

%d, %i 十进制整数

%o 不带正负号的八进制值

%u 不带正负号的十进制值

%x 不带正负号的十六进制值，使用a至f表示10至15

%X 不带正负号的十六进制值，使用A至F表示10至15

%% 表示”%”本身

常见的转义字符：

\a 警告字符，通常为ASCII的BEL字符

\b 后退

\c 抑制（不显示）输出结果中任何结尾的换行字符（只在%b格式指示符控制下的参数字符串中有效），而且，任何留在参数里的字符、任何接下来的参数以及任何留在格式字符串中的字符，都被忽略

\f 换页（formfeed）

\n 换行

\r 回车（Carriage return）

\t 水平制表符

\v 垂直制表符

\ 一个字面上的反斜杠字符，即”\”本身。

\ddd 表示1到3位数八进制值的字符，仅在格式字符串中有效

\0ddd 表示1到3位的八进制值字符

printf

看这个比方：悟到了么？

[root@yunmx scripts]# awk '{printf $1}' test.txt
111222333444555[root@yunmx scripts]# cat test.txt
111 #$% 11 # 1
222 #$% 22 # 2
333 #$% 33 # 3
444 #$% 44 # 4
555 #$% 55 # 5
[root@yunmx scripts]#

那就继续悟吧：

[root@yunmx scripts]# awk '{printf "%s\n",$1}' test.txt
111
222
333
444
555
[root@yunmx scripts]#

使用printf的注意事项：

1）使用printf动作输出的文本不会换行，如果需要换行，可以在对应的”格式替换符”后加入”\n”进行转义

2）使用printf动作时，”指定的格式” 与 “被格式化的文本” 之间，需要用”逗号”隔开

3）使用printf动作时，”格式”中的”格式替换符”必须与 “被格式化的文本” 一一对应

[root@yunmx scripts]# awk '{printf "第一列：%s//第二列：%s\n",$1,$2}' test.txt
第一列：111//第二列：#$%
第一列：222//第二列：#$%
第一列：333//第二列：#$%
第一列：444//第二列：#$%
第一列：555//第二列：#$%
[root@yunmx scripts]#

来一个比方有意思的比方：使用特殊的模式，制作表格输出

[root@yunmx scripts]# awk -v FS=":" 'BEGIN{printf "%-10s\t %s\n","用户名称","用户ID"}{printf "%-10s\t %s\n",$1,$3}' passwd
用户名称         用户ID
root             0
bin              1
daemon           2
adm              3
nginx            995
grafana          994
[root@yunmx scripts]#

awk 模式再修炼

再来回顾一下 awk 的基本语法：awk [options] ‘Pattern {Action}’ file1 file2 ···

上面修炼的时候说的模式的特殊模式 BEGIN 和 END

模式换成”条件“更好理解一些
awk 一行一行处理，如果不指定条件，那就是一行处理一行处理，处理全部行为止，如果有条件，只有满足条件的才会被处理

[root@yunmx scripts]# awk 'NR==1{print NR,$0}' passwd
1 root:x:0:0:root:/root:/bin/bash
[root@yunmx scripts]# awk '{print NR,$0}' passwd
1 root:x:0:0:root:/root:/bin/bash
2 bin:x:1:1:bin:/bin:/sbin/nologin
3 daemon:x:2:2:daemon:/sbin:/sbin/nologin
4 adm:x:3:4:adm:/var/adm:/sbin/nologin
5 nginx:x:995:992:Nginx web server:/var/lib/nginx:/sbin/nologin
6 grafana:x:994:991:grafana user:/usr/share/grafana:/sbin/nologin
[root@yunmx scripts]# awk 'NR==2{print NR,$0}' passwd
2 bin:x:1:1:bin:/bin:/sbin/nologin
[root@yunmx scripts]#

看懂了撒：

[root@yunmx scripts]# awk 'NR>=2{print NR,$0}' passwd
2 bin:x:1:1:bin:/bin:/sbin/nologin
3 daemon:x:2:2:daemon:/sbin:/sbin/nologin
4 adm:x:3:4:adm:/var/adm:/sbin/nologin
5 nginx:x:995:992:Nginx web server:/var/lib/nginx:/sbin/nologin
6 grafana:x:994:991:grafana user:/usr/share/grafana:/sbin/nologin
[root@yunmx scripts]#

关系操作符：

关系运算符	含义	用法示例
<	小于	x < y
<=	小于等于	x <= y
==	等于	x == y
!=	不等于	x != y
>=	大于等于	x >= y
>	大于	x > y
~	与对应的正则匹配则为真	x ~ /正则/
!~	与对应的正则不匹配则为真	x !~ /正则/

正则模式

不多说，直接上比方：

[root@yunmx scripts]# awk '/^root/{print $0}' /etc/passwd
root:x:0:0:root:/root:/bin/bash
[root@yunmx scripts]#

[root@yunmx scripts]# awk '/\/bin\/bash$/{print $0}' /etc/passwd
root:x:0:0:root:/root:/bin/bash
lighthouse:x:1000:1000::/home/lighthouse:/bin/bash
halo:x:1001:1001::/home/halo:/bin/bash
[root@yunmx scripts]#

悟了么：

awk 正则下使用//中
如果正则用到/就需要转义，地球人都知道这个

注意：

1、当在awk命令中使用正则模式时，使用到的正则用法属于”扩展正则表达式”

2、当使用 {x,y} 这种次数匹配的正则表达式时，需要配合–posix选项或者–re-interval选项

行范围模式

awk '//,//{}' file

[root@yunmx scripts]# awk '/^root/,/^halt/{print $0}' /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
[root@yunmx scripts]#

awk 动作1：控制

awk '{print $0}' test.txt

动作内人最外侧括号{}：组合语句的动作，就是将多个代码组合代码块
括号内的内容：print $0：输出语句的动作

[root@yunmx scripts]# awk '{print $1}{print $2}' test.txt
111
#$%
222
#$%
333
#$%
444
#$%
555
#$%
[root@yunmx scripts]# cat test.txt
111 #$% 11 # 1
222 #$% 22 # 2
333 #$% 33 # 3
444 #$% 44 # 4
555 #$% 55 # 5
[root@yunmx scripts]#

也可以放在一起：

[root@yunmx scripts]# awk '{print $1;print $2}' test.txt
111
#$%
222
#$%
333
#$%
444
#$%
55
#$%
[root@yunmx scripts]#

另一种动作：控制语句

if：条件判断

为什么这么多大括号，可以思考一下，我好像似懂非懂

[root@yunmx scripts]# awk '{if(NR == 1){print $0}}' test.txt
111 #$% 11 # 1
[root@yunmx scripts]#

if else：

if(条件)
{
语句1;
语句2;
...
}
else
{
语句1;
语句2;
...
}

[root@yunmx scripts]# awk '{if(NR == 1){print NR,$0}else{print NR,"不符合条件"}}' test.txt
1 111 #$% 11 # 1
2 不符合条件
3 不符合条件
4 不符合条件
5 不符合条件
[root@yunmx scripts]# cat test.txt
111 #$% 11 # 1
222 #$% 22 # 2
333 #$% 33 # 3
444 #$% 44 # 4
555 #$% 55 # 5
[root@yunmx scripts]#

if else fi else:

if(条件1)
{
语句1;
语句2;
...
}
else if(条件2)
{
语句1;
语句2;
...
}
else
{
语句1;
语句2;
...
}

[root@yunmx scripts]# awk '{if(NR == 2){print $0}else if(NR == 3){print $0}else{print "不符合条件"}}' test.txt
不符合条件
222 #$% 22 # 2
333 #$% 33 # 3
不符合条件
不符合条件
[root@yunmx scripts]#

awk 动作2：循环

我还不知道这个在实际企业生产环境中用得多不多，不管了，先修炼了再说

#for循环语法格式1
for(初始化; 布尔表达式; 更新) {
//代码语句
}
 
#for循环语法格式2
for(变量 in 数组) {
//代码语句
}
 
#while循环语法
while( 布尔表达式 ) {
//代码语句
}
 
#do...while循环语法
do {
//代码语句
}while(条件)

for

[root@yunmx scripts]# awk 'BEGIN{for(i=1;i<=10;i++){print i}}'
1
2
3
4
5
6
7
8
9
10
[root@yunmx scripts]#

while

[root@yunmx scripts]# awk 'BEGIN{i=1;while(i<=5){print i;i++}}'
1
2
3
4
5
[root@yunmx scripts]#

do while：无论是否满足条件，都会执行一遍代码，然后再判断满足的条件

[root@yunmx scripts]# awk 'BEGIN{i=1;do {print "test";i++}while(i<=5)}'
test
test
test
test
test
[root@yunmx scripts]#

跳出循环：break和continue

懒得实验了

awk 中的 exit：代表着之后所有的动作都不执行了，相当于退出了整个命令

awk 数组

在编程语言中，数组都是通过数组的下标来引用数组中的元素的，awk 也是

awk 中不用声明数组，直接为数组中的元素赋值就行了

[root@yunmx scripts]# awk 'BEGIN{ceshi[0]=1;ceshi[1]=2;ceshi[2]=3;print ceshi[1]}'
2
[root@yunmx scripts]# awk 'BEGIN{ceshi[0]=1;ceshi[1]=2;ceshi[2]=3;print ceshi[2]}'
3
[root@yunmx scripts]#

如果数组太长，我们可以使用换行符：

[root@yunmx scripts]# awk 'BEGIN{ceshi[0]=1;ceshi[1]=2;ceshi[2]=3;\
> ceshi[3]=4;ceshi[4]=5;print ceshi[3]}'
4
[root@yunmx scripts]#

友情提示：awk 中国数据元素的值空字符串是合法的

元素也可以是字符串：(关联数组)

[root@whale scripts]# awk 'BEGIN{ceshi["姓名1"]="赵";ceshi["姓名2"]="钱";print ceshi["姓名1"] }'
赵
[root@whale scripts]#

awk 中的数组本来就是”关联数组”，之所以先用以数字作为下标的数组举例，是为了让读者能够更好的过度，不过，以数字作为数组下标的数组在某些场景中有一定的优势，但是它本质上也是关联数组，awk 默认会把”数字”下标转换为”字符串”，所以，本质上它还是一个使用字符串作为下标的关联数组。

删除数组：delete：

[root@whale scripts]# awk 'BEGIN{ceshi["姓名1"]="赵";ceshi["姓名2"]="钱";delete ceshi["姓名1"];print ceshi["姓名2"];print ceshi["姓名1"] }'
钱

[root@whale scripts]#

删除整个数组：delete 数组名

[root@whale scripts]# awk 'BEGIN{ceshi["姓名1"]="赵";ceshi["姓名2"]="钱";delete ceshi;print ceshi["姓名2"];print ceshi["姓名1"] }'


[root@whale scripts]#

借助 for

#for循环语法格式1
for(初始化; 布尔表达式; 更新) {
//代码语句
}
 
#for循环语法格式2
for(变量 in 数组) {
//代码语句
}

[root@yunmx scripts]# awk 'BEGIN{ceshi[0]=1;ceshi[1]=2;ceshi[2]=3;\
ceshi[3]=4;ceshi[4]=5;for(i in ceshi){print i,ceshi[i]}}'
4 5
0 1
1 2
2 3
3 4
[root@yunmx scripts]#

[root@whale scripts]# awk 'BEGIN{ceshi["姓名1"]="赵";ceshi["姓名2"]="钱";\
ceshi["姓名3"]="孙";ceshi["姓名4"]="李";\
for(i in ceshi){print i,ceshi[i]}}'
姓名1 赵
姓名2 钱
姓名3 孙
姓名4 李
[root@whale scripts]#

这地方有点问题，下来再继续修炼，就不继续写了

awk 内置函数

算数函数
字符串函数
时间函数
其他函数

算数函数

rand函数/srand函数/int函数
使用 rand 函数生成随机数，需要 srand 的配合，不然 rand 函数值一直会不变的：

[root@whale scripts]# awk 'BEGIN{print rand()}'
0.237788
[root@whale scripts]# awk 'BEGIN{print rand()}'
0.237788
[root@whale scripts]# awk 'BEGIN{print rand()}'
0.237788
[root@whale scripts]#

随机需要 srand 配合：

[root@whale scripts]# awk 'BEGIN{srand();print rand()}'
0.513044
[root@whale scripts]# awk 'BEGIN{srand();print rand()}'
0.31852
[root@whale scripts]# awk 'BEGIN{srand();print rand()}'
0.788642
[root@whale scripts]#

生成特定的整数型，可以使用 int 函数截取整数部分：

[root@whale scripts]# awk 'BEGIN{srand();print int(100*rand())}'
35
[root@whale scripts]# awk 'BEGIN{srand();print int(10000*rand())}'
3437
[root@whale scripts]# awk 'BEGIN{srand();print int(10000*rand())}'
3745
[root@whale scripts]# awk 'BEGIN{srand();print int(10000*rand())}'
3745
[root@whale scripts]#

字符串函数

使用 gsub 函数或 sub 函数替换某些文本

使用 gsub进行字符串替换：gsub 函数会在指定范围内查找指定的字符，并将其替换为指定的字符串

[root@whale scripts]# awk '{gsub("aa","AA",$2);print $2}' test.txt
AA
bb
cc
dd
ff
[root@whale scripts]# cat test.txt
192.168.0.101   aa
192.168.0.102   bb
192.168.0.104   cc
192.168.0.103   dd
192.168.0.101   ff
[root@whale scripts]#

[root@whale scripts]# cat test.txt
192.168.0.101   aa  cc
192.168.0.102   bb  aa
192.168.0.104   cc  aa
192.168.0.103   aa  cc
192.168.0.101   ff  dd
[root@whale scripts]# awk '{gsub("aa","AA",$2);print $2,$3}' test.txt
AA cc
bb aa
cc aa
AA cc
ff dd
[root@whale scripts]#

[root@whale scripts]# awk '{gsub("aa","AA",$0);print $2,$3}' test.txt
AA cc
bb AA
cc AA
AA cc
ff dd
[root@whale scripts]# cat test.txt
192.168.0.101   aa  cc
192.168.0.102   bb  aa
192.168.0.104   cc  aa
192.168.0.103   aa  cc
192.168.0.101   ff  dd
[root@whale scripts]#

如果省略最后一个比方中的$0，默认就是$0

sub 函数的作用理解为指定范围内的单次替换，只替换第一次匹配到的字符
length 函数，获取到指定字符串的长度

[root@whale scripts]# awk 'BEGIN{a="dahsdkjhak";print a,length(a)}'
dahsdkjhak 10
[root@whale scripts]#

[root@whale scripts]# awk '{print $0,length()}' test.txt
192.168.0.101   aa  cc 22
192.168.0.102   bb  aa 22
192.168.0.104   cc  aa 22
192.168.0.103   aa  cc 22
192.168.0.101   ff  dd 22
[root@whale scripts]#

index 函数，获取到指定字符位于整个字符串中的位置

[root@whale scripts]# awk '{print index($0,"aa")}' test.txt
17
21
21
17
0
[root@whale scripts]#

split 函数：将指定的字符串按照指定的分割符切割，将切割后的每一段赋值到数组的元素中，从而动态的创建数组，分割后的数组的元素下标从1开始的，这和其他的不一样，输出的顺序可能也不一样，我反正不喜欢用

[root@whale scripts]# awk -v  name="赵##钱##孙##李" 'BEGIN{split(name,ceshi,"##");for(i in ceshi){print ceshi[i]}}'
李
赵
钱
孙
[root@whale scripts]#

其他函数还有，我也不喜欢用，感觉用不到，需要用到再去学习吧

awk “三元运算符"和"打印奇偶行”

三元运算

语法：条件？结果1:结果2

表示：如果条件成立，返回结果1，如果条件不成立，返回结果2

[root@whale scripts]# awk -F: '{userid=$3<1000?"system users":"ordinary users";print $1,userid}' passwd
root system users
bin system users
daemon system users
adm system users
[root@whale scripts]#

打印奇偶行

[root@whale scripts]# cat -n passwd
     1  root:x:0:0:root:/root:/bin/bash
     2  bin:x:1:1:bin:/bin:/sbin/nologin
     3  daemon:x:2:2:daemon:/sbin:/sbin/nologin
     4  adm:x:3:4:adm:/var/adm:/sbin/nologin
[root@whale scripts]# awk 'i=!i' passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:2:2:daemon:/sbin:/sbin/nologin
[root@whale scripts]# awk '!(i=!i)' passwd
bin:x:1:1:bin:/bin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
[root@whale scripts]#

mdnb，我只想说，我记住它，下次直接用

在 awk 中，0 或者空字符串表示”假”，非0值或者非空字符串表示”真”

[root@whale scripts]# awk '{print $0}' passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
[root@whale scripts]# awk '2{print $0}' passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
[root@whale scripts]# awk '2' passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
[root@whale scripts]# awk '0{print $0}' passwd
[root@whale scripts]# awk '0' passwd
[root@whale scripts]#

来进行一个取反：非真即为假，非假即为真

[root@whale scripts]# awk '0' passwd
[root@whale scripts]# awk '!0' passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
[root@whale scripts]#

延申一下：赋值了一个变量，非0值，即为真

[root@whale scripts]# awk 'i=1' passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
[root@whale scripts]# awk 'i=0' passwd
[root@whale scripts]#

再来看打印奇偶行：
- awk 开始处理第一行时，i 被初始化，值为空，因为数字0或者空字符串表示假，但是！直接取反了，对假取反后就是真了撒，取反后又将值赋给了变量i，即为真…懂得都懂

[root@whale scripts]# cat -n passwd
     1  root:x:0:0:root:/root:/bin/bash
     2  bin:x:1:1:bin:/bin:/sbin/nologin
     3  daemon:x:2:2:daemon:/sbin:/sbin/nologin
     4  adm:x:3:4:adm:/var/adm:/sbin/nologin
[root@whale scripts]# awk 'i=!i' passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:2:2:daemon:/sbin:/sbin/nologin
[root@whale scripts]#

_我悟到了

awk 练习

处理以下内容：将域名取出并根据域名进行计数排序处理

http://www.etiantian.org/index.html
http://www.etiantian.org/1.html
http://post.etiantian.org/index.html
http://mp3.etiantian.org/index.html
http://www.etiantian.org/3.html
http://post.etiantian.org/2.html

实现：

[root@whale scripts]# awk -v FS="/" '{print $3}' oldboy.log|sort|uniq -c
      1 mp3.etiantian.org
      2 post.etiantian.org
      3 www.etiantian.org
[root@whale scripts]#

awk 插入几个新字段：在”a b c d”的b后面插入3个字段e f g

在这里插入图片描述

以下文本内容：移除每行的前缀、后缀空白，并将各部分左对齐。

      aaaa        bbb     ccc                 
   bbb     aaa ccc
ddd       fff             eee gg hh ii jj

在这里插入图片描述

从 ifconfig 命令的结果中筛选出除了lo网卡外的所有IPv4地址

小鲸鱼大梦想

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
三剑客修炼小抄（一）： awk 从被废到大乘圆满飞升渡劫

grep 、sed、awk 被称为 linux 中的”三剑客”就上面这几个人弄出来的，所以叫：awkawk 其实是一门编程语言，它支持条件判断、数组、循环等功能。所以，我们也可以把 awk 理解成一个脚本语言解释器请看这个基本语法：使用最简单的 action下面这个比方就是将的内容通过管道符传递给进行处理，执行的动作就是一个最基本的打印功能，省略了和再看下方：只输出了的第二列数据，表示当前行按照分隔符分割后的第二列，不指定分隔符的时候，默认使用空格作为分隔符，且会自动将连续的空格当作分隔符的，所以：aw
复制链接

扫一扫