awk编程语言/数据处理引擎
文章目录
一、简述:
1、基于模式匹配检查输入文本,逐行处理并输出
2、通常用在shell脚本中,获取指定数据
3、单独使用时,可对文本数据做统计
二、常用命令格式
1、前置命令 | awk [选项] ’[条件] {指令}’
2、awk [选项] ‘[条件] {指令}’ 文件路径
三、常用选项
1、-F:指定打印的分隔符,可省略(默认是空格和TAB位作为分隔符)
四、常用内置变量
1、FS:保存或设置字段分割符,例如FS=”:” ,与-F功能一样
2、$n:指定分隔的第n个字段,如$1、$2分别表示第一列、第二列
3、$0:表示当前读入文本里的整行内容
4、NF:表示当前读入文本里的列数
5、NR:表示当前读入文本里的行数
举例
(1)打印ens33网卡的IP地址,/ens33$/查找以ens33字符结尾的行,$2打印第二列,默认分隔符为空格。
[root@localhost ~]# ip a s
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:7f:4c:93 brd ff:ff:ff:ff:ff:ff
inet 192.168.4.8/24 brd 192.168.4.255 scope global noprefixroute ens33
valid_lft forever preferred_lft forever
inet6 fe80::91e5:778d:5a20:5a4f/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
link/ether 52:54:00:09:b5:c0 brd ff:ff:ff:ff:ff:ff
inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
valid_lft forever preferred_lft forever
4: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN group default qlen 1000
link/ether 52:54:00:09:b5:c0 brd ff:ff:ff:ff:ff:ff
[root@localhost ~]$ ip a s | awk '/ens33$/{print $2}'
192.168.4.8/24
(2)打印CPU型号,当需要打印的内容包含多个空格时可以使用-F选项指定打印的分隔符为“:”,/Model name/指查找字符为Model name的行,$2打印第二列。
[root@localhost ~]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 151
Model name: 12th Gen Intel(R) Core(TM) i7-12700KF
Stepping: 2
CPU MHz: 3609.598
BogoMIPS: 7219.19
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 48K
L1i cache: 32K
L2 cache: 1280K
L3 cache: 25600K
NUMA node0 CPU(s): 0,1
[root@localhost ~]$ lscpu |awk -F: '/Model name/ {print $2}'
12th Gen Intel(R) Core(TM) i7-12700KF
(3):已“:”为分隔符,打印操作系统用户信息passwd.txt文件的行数(NR),列数(NF),以及文件的所有内容($0)
[root@localhost ~]# awk -F: '{print NR,NF,$0}' passwd.txt
1 7 root:x:0:0:root:/root:/bin/bash
2 7 bin:x:1:1:bin:/bin:/sbin/nologin
3 7 daemon:x:2:2:daemon:/sbin:/sbin/nologin
4 7 ntp:x:38:38::/etc/ntp:/sbin/nologin
5 7 gdm:x:42:42::/var/lib/gdm:/sbin/nologin
6 7 gnome-initial-setup:x:989:983::/run/gnome-initial-setup/:/sbin/nologin
7 7 sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
8 7 avahi:x:70:70:Avahi mDNS/DNS-SD Stack:/var/run/avahi-daemon:/sbin/nologin
9 7 postfix:x:89:89::/var/spool/postfix:/sbin/nologin
10 7 tcpdump:x:72:72::/:/sbin/nologin
11 7 lrh:x:1000:1000:lrh:/home/lrh:/bin/bash
12 7 apache:x:48:48:Apache:/usr/share/httpd:/sbin/nologin
(4)打印apache用户所在文本的行数以及整行内容
[root@localhost ~]$ awk -F: '/apache/{print NR,$0}' /etc/passwd
44 apache:x:48:48:Apache:/usr/share/httpd:/sbin/nologin
五、awk处理时机,可以安排额外的任务
- BEGIN{ }: 放在BEGIN花括号里的命令只会执行1次
- { }逐行任务: 放在花括号里的命令可执行n次,具体次数根据文本行数决定
- END{ }: 放在END花括号里的命令只会执行1次
- 举例:
- 文本内容如下:
cat ys.txt
keli 8 1.27 200
shenli 18 1.56 50000
zhongli 6000 1.86 0
kaiya 30 1.86 30000
kog 500 1.62 20000
(1)给文本内容打印列名name\tage\theight\tmora,命令中“\t”表示空格,放在BEGIN{ }里只会打印一行,并且优先执行BEGIN的命令
[root@localhost ~]$ awk 'BEGIN{print "name\tage\theight\tmora"}{print $0}' ys.txt
name age height mora
shenli 18 1.56 50000
shenli 18 1.56 50000
kaiya 30 1.86 30000
kaiya 30 1.86 30000
kaiya 30 1.86 30000
(2)放在{ }逐行任务里的“=”号,会打印5次,因为文本里一共有5行,最后执行END{ }里的命令,END和BEGIN一样,只会执行一次命令,打印一行内容。
[root@localhost ~]$ awk 'BEGIN{print "name\tage\theight\tmora"}{print "==============================="}{print $0} END {print "==============END=============="}' ys.txt
name age height mora
===============================
shenli 18 1.56 50000
===============================
shenli 18 1.56 50000
===============================
kaiya 30 1.86 30000
===============================
kaiya 30 1.86 30000
===============================
kaiya 30 1.86 30000
==============END==============
(3)把原来放在{ }逐行任务里的“=”号放在BEGIN{ }里可以明显看出区别,只打印了一行内容,“\n”表示回车符。
[root@localhost ~]$ awk 'BEGIN{print "=============BEGIN=============\nname\tage\theight\tmora"}{print $0} END {print "==============END=============="}' ys.txt
=============BEGIN=============
name age height mora
shenli 18 1.56 50000
shenli 18 1.56 50000
kaiya 30 1.86 30000
kaiya 30 1.86 30000
kaiya 30 1.86 30000
==============END==============
(4)想要计算文本中某一列数值的和可以在{ }逐行任务里定义变量“mora”+=需要计算的列,最后再打印变量mora即可得到第四列数值的和,注意变量mora前面不需要加$号,不能放在双引号里面。
[root@localhost ~]$ awk 'BEGIN{print "=============BEGIN=============\nname\tage\theight\tmora"} {mora+=$4}{print $0} END {print "sum\t\t\t"mora"\n==============END=============="}' ys.txt
=============BEGIN=============
name age height mora
shenli 18 1.56 50000
shenli 18 1.56 50000
kaiya 30 1.86 30000
kaiya 30 1.86 30000
kaiya 30 1.86 30000
sum 190000
==============END==============
六、使用正则
- ~包含 !~不包含
- 举例:
- 文本内容如下:
[root@localhost ~]$ cat passwd.txt
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
ntp:x:38:38::/etc/ntp:/sbin/nologin
gdm:x:42:42::/var/lib/gdm:/sbin/nologin
gnome-initial-setup:x:989:983::/run/gnome-initial-setup/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
avahi:x:70:70:Avahi mDNS/DNS-SD Stack:/var/run/avahi-daemon:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
tcpdump:x:72:72::/:/sbin/nologin
lrh:x:1000:1000:lrh:/home/lrh:/bin/bash
apache:x:48:48:Apache:/usr/share/httpd:/sbin/nologin
(1)输出有bin的行,无论bin在第几列
[root@localhost ~]$ awk -F: '/bin/{print}' passwd.txt
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
ntp:x:38:38::/etc/ntp:/sbin/nologin
gdm:x:42:42::/var/lib/gdm:/sbin/nologin
gnome-initial-setup:x:989:983::/run/gnome-initial-setup/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
avahi:x:70:70:Avahi mDNS/DNS-SD Stack:/var/run/avahi-daemon:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
tcpdump:x:72:72::/:/sbin/nologin
lrh:x:1000:1000:lrh:/home/lrh:/bin/bash
apache:x:48:48:Apache:/usr/share/httpd:/sbin/nologin
(2)输出第6列包含bin的行
[root@localhost ~]$ awk -F: '$6~/bin/{print}' passwd.txt
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
(3)输出第6列不包含bin的行
[root@localhost ~]$ awk -F: '$6!~/bin/{print}' passwd.txt
root:x:0:0:root:/root:/bin/bash
ntp:x:38:38::/etc/ntp:/sbin/nologin
gdm:x:42:42::/var/lib/gdm:/sbin/nologin
gnome-initial-setup:x:989:983::/run/gnome-initial-setup/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
avahi:x:70:70:Avahi mDNS/DNS-SD Stack:/var/run/avahi-daemon:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
tcpdump:x:72:72::/:/sbin/nologin
lrh:x:1000:1000:lrh:/home/lrh:/bin/bash
apache:x:48:48:Apache:/usr/share/httpd:/sbin/nologin
(4)使用awk时如果指令就是{print},且前面写了条件的情况下可以省略{print}不写
[root@localhost ~]$ awk -F: '/bin/' passwd.txt
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
ntp:x:38:38::/etc/ntp:/sbin/nologin
gdm:x:42:42::/var/lib/gdm:/sbin/nologin
gnome-initial-setup:x:989:983::/run/gnome-initial-setup/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
avahi:x:70:70:Avahi mDNS/DNS-SD Stack:/var/run/avahi-daemon:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
tcpdump:x:72:72::/:/sbin/nologin
lrh:x:1000:1000:lrh:/home/lrh:/bin/bash
apache:x:48:48:Apache:/usr/share/httpd:/sbin/nologin
(5)输出第6列包含bin的行,省略{print}
[root@localhost ~]$ awk -F: '$6~/bin/' passwd.txt
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
七、使用数字或者字符串
- 相等 !=不相等 >= 大于等于 > 大于 <=小于等于 <小于
- 举例:
- 文本内容如下:
[root@localhost ~]$ cat passwd.txt
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
ntp:x:38:38::/etc/ntp:/sbin/nologin
gdm:x:42:42::/var/lib/gdm:/sbin/nologin
gnome-initial-setup:x:989:983::/run/gnome-initial-setup/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
avahi:x:70:70:Avahi mDNS/DNS-SD Stack:/var/run/avahi-daemon:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
tcpdump:x:72:72::/:/sbin/nologin
lrh:x:1000:1000:lrh:/home/lrh:/bin/bash
apache:x:48:48:Apache:/usr/share/httpd:/sbin/nologin
(1)输出第3行
[root@localhost ~]$ awk -F: 'NR==3' passwd.txt
daemon:x:2:2:daemon:/sbin:/sbin/nologin
(2)输出第1~2行
[root@localhost ~]$ awk -F: 'NR<3' passwd.txt
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
(3)找uid是0~9的用户信息
[root@localhost ~]$ awk -F: '$3<10' passwd.txt
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
(4)找普通用户
[root@localhost ~]$ awk -F: '$3>=1000' passwd.txt
lrh:x:1000:1000:lrh:/home/lrh:/bin/bash
(5)找解释器是/bin/bash的用户
[root@localhost ~]$ awk -F: '$7=="/bin/bash"' passwd.txt
root:x:0:0:root:/root:/bin/bash
lrh:x:1000:1000:lrh:/home/lrh:/bin/bash
八、逻辑组合
- &&并且 || 或者
- 举例:
- 文本内容如下:
[root@localhost ~]$ cat passwd.txt
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
ntp:x:38:38::/etc/ntp:/sbin/nologin
gdm:x:42:42::/var/lib/gdm:/sbin/nologin
gnome-initial-setup:x:989:983::/run/gnome-initial-setup/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
avahi:x:70:70:Avahi mDNS/DNS-SD Stack:/var/run/avahi-daemon:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
tcpdump:x:72:72::/:/sbin/nologin
lrh:x:1000:1000:lrh:/home/lrh:/bin/bash
apache:x:48:48:Apache:/usr/share/httpd:/sbin/nologin
(1)找uid范围是10~40的行
[root@localhost ~]$ awk -F: '$3>=10&&$3<=40' passwd.txt
ntp:x:38:38::/etc/ntp:/sbin/nologin
(2)找2~10行
[root@localhost ~]$ awk -F: 'NR>=2&&NR<=10' passwd.txt
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
ntp:x:38:38::/etc/ntp:/sbin/nologin
gdm:x:42:42::/var/lib/gdm:/sbin/nologin
gnome-initial-setup:x:989:983::/run/gnome-initial-setup/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
avahi:x:70:70:Avahi mDNS/DNS-SD Stack:/var/run/avahi-daemon:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
tcpdump:x:72:72::/:/sbin/nologin
(3)找uid是0~4或者1001以上的行
[root@localhost ~]$ awk -F: '$3<5||$3>1001' passwd.txt
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
九、使用数组统计某一列相同的行数
- 举例:
- 文本内容如下:
[root@localhost ~]$ cat ys.txt
shenli 18 1.56 50000
shenli 18 1.56 50000
kaiya 30 1.86 30000
kaiya 30 1.86 30000
kaiya 30 1.86 30000
(1)统计文本第一列名字相同行数,使用数值name[$1]可以存储多个变量,name[$1]++表示把第一列重复的名字进行加1计算并存储到数组name[$1]下标,最后使用END任务输出name[shenli]和name[kaiya]的值是 2和 3,但这种方法不能同时显示name的值及name[$1]下标重复的次数
[root@localhost ~]$ awk '{name[$1]++}END{print name["shenli"],name["kaiya"]}' ys.txt
2 3
十、使用数组和for循环实现高级搜索
- 举例:
- 文本内容如下:
[root@localhost ~]$ cat /var/log/httpd/access_log
127.0.0.1 - - [03/Jul/2022:21:12:57 +0800] "GET / HTTP/1.1" 403 4897 "-" "curl/7.29.0"
127.0.0.1 - - [03/Jul/2022:21:13:04 +0800] "GET / HTTP/1.1" 403 4897 "-" "curl/7.29.0"
127.0.0.1 - - [03/Jul/2022:21:13:06 +0800] "GET / HTTP/1.1" 403 4897 "-" "curl/7.29.0"
127.0.0.1 - - [03/Jul/2022:21:13:06 +0800] "GET / HTTP/1.1" 403 4897 "-" "curl/7.29.0"
192.168.4.1 - - [03/Jul/2022:21:15:13 +0800] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"
192.168.4.1 - - [03/Jul/2022:21:15:14 +0800] "GET / HTTP/1.1" 403 4897 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"
192.168.4.1 - - [03/Jul/2022:21:15:14 +0800] "GET /noindex/css/bootstrap.min.css HTTP/1.1" 200 19341 "http://192.168.4.8/" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"
192.168.4.1 - - [03/Jul/2022:21:15:14 +0800] "GET /noindex/css/fonts/Light/OpenSans-Light.ttf HTTP/1.1" 404 240 "http://192.168.4.8/noindex/css/open-sans.css" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"
192.168.4.1 - - [03/Jul/2022:21:15:14 +0800] "GET /favicon.ico HTTP/1.1" 404 209 "http://192.168.4.8/" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"
192.168.4.20 - - [03/Jul/2022:21:20:03 +0800] "GET / HTTP/1.1" 403 4897 "-" "curl/7.61.1"
192.168.4.20 - - [03/Jul/2022:21:20:04 +0800] "GET / HTTP/1.1" 403 4897 "-" "curl/7.61.1"
192.168.4.20 - - [03/Jul/2022:21:20:05 +0800] "GET / HTTP/1.1" 403 4897 "-" "curl/7.61.1"
192.168.4.20 - - [03/Jul/2022:21:20:05 +0800] "GET / HTTP/1.1" 403 4897 "-" "curl/7.61.1"
192.168.4.20 - - [03/Jul/2022:21:20:05 +0800] "GET / HTTP/1.1" 403 4897 "-" "curl/7.61.1"
192.168.4.20 - - [03/Jul/2022:21:20:06 +0800] "GET / HTTP/1.1" 403 4897 "-" "curl/7.61.1"
192.168.4.20 - - [03/Jul/2022:21:20:06 +0800] "GET / HTTP/1.1" 403 4897 "-" "curl/7.61.1"
(1)使用for循环,循环显示数组ip的值,与下标,其中for(i in ip)里面的i是变量,代表下标,in是语法不能变,ip[$1]++是把第一列重复的ip进行加1计算并存储到数组ip[$1]下标,ip是数组名,ip[i]下标可以存储多个不同的变量,最后使用END任务循环输出数组ip的值,与下标。
[root@localhost ~]$ awk '{ip[$1]++}END{for(i in ip){print i,ip[i] } }' /var/log/httpd/access_log
127.0.0.1 4
192.168.4.1 5
192.168.4.20 7
十一、awk使用if判断
(1)定义变量num=21,如果21除以2等于0,则21是偶数,否则是奇数,printf %d是输出整数,默认右对齐,%d对应双引号外的num变量,printf不会输出内容后自动回车,因此需要加上\n。
[root@localhost ~]$ awk 'BEGIN{num=21; if(num % 2 == 0)printf "%d 偶数\n",num;else printf "%d 奇数\n",num}'
21 奇数