AWK简介及使用

awk编程语言/数据处理引擎

一、简述:

1、基于模式匹配检查输入文本,逐行处理并输出
2、通常用在shell脚本中,获取指定数据
3、单独使用时,可对文本数据做统计

二、常用命令格式

1、前置命令 | awk [选项] ’[条件] {指令}’
2、awk [选项] ‘[条件] {指令}’ 文件路径

三、常用选项

1、-F:指定打印的分隔符,可省略(默认是空格和TAB位作为分隔符)

四、常用内置变量

1、FS:保存或设置字段分割符,例如FS=”:” ,与-F功能一样
2、$n:指定分隔的第n个字段,如$1、$2分别表示第一列、第二列
3、$0:表示当前读入文本里的整行内容
4、NF:表示当前读入文本里的列数
5、NR:表示当前读入文本里的行数

举例
(1)打印ens33网卡的IP地址,/ens33$/查找以ens33字符结尾的行,$2打印第二列,默认分隔符为空格。

[root@localhost ~]# ip a s
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:7f:4c:93 brd ff:ff:ff:ff:ff:ff
    inet 192.168.4.8/24 brd 192.168.4.255 scope global noprefixroute ens33
       valid_lft forever preferred_lft forever
    inet6 fe80::91e5:778d:5a20:5a4f/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
3: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 52:54:00:09:b5:c0 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
       valid_lft forever preferred_lft forever
4: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN group default qlen 1000
    link/ether 52:54:00:09:b5:c0 brd ff:ff:ff:ff:ff:ff
[root@localhost ~]$ ip a s | awk '/ens33$/{print $2}'
192.168.4.8/24

(2)打印CPU型号,当需要打印的内容包含多个空格时可以使用-F选项指定打印的分隔符为“:”,/Model name/指查找字符为Model name的行,$2打印第二列。

[root@localhost ~]# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                2
On-line CPU(s) list:   0,1
Thread(s) per core:    1
Core(s) per socket:    2
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 151
Model name:            12th Gen Intel(R) Core(TM) i7-12700KF
Stepping:              2
CPU MHz:               3609.598
BogoMIPS:              7219.19
Hypervisor vendor:     VMware
Virtualization type:   full
L1d cache:             48K
L1i cache:             32K
L2 cache:              1280K
L3 cache:              25600K
NUMA node0 CPU(s):     0,1
[root@localhost ~]$ lscpu |awk -F: '/Model name/ {print $2}'
            12th Gen Intel(R) Core(TM) i7-12700KF

(3):已“:”为分隔符,打印操作系统用户信息passwd.txt文件的行数(NR),列数(NF),以及文件的所有内容($0)

[root@localhost ~]# awk -F: '{print NR,NF,$0}' passwd.txt
1 7 root:x:0:0:root:/root:/bin/bash
2 7 bin:x:1:1:bin:/bin:/sbin/nologin
3 7 daemon:x:2:2:daemon:/sbin:/sbin/nologin
4 7 ntp:x:38:38::/etc/ntp:/sbin/nologin
5 7 gdm:x:42:42::/var/lib/gdm:/sbin/nologin
6 7 gnome-initial-setup:x:989:983::/run/gnome-initial-setup/:/sbin/nologin
7 7 sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
8 7 avahi:x:70:70:Avahi mDNS/DNS-SD Stack:/var/run/avahi-daemon:/sbin/nologin
9 7 postfix:x:89:89::/var/spool/postfix:/sbin/nologin
10 7 tcpdump:x:72:72::/:/sbin/nologin
11 7 lrh:x:1000:1000:lrh:/home/lrh:/bin/bash
12 7 apache:x:48:48:Apache:/usr/share/httpd:/sbin/nologin

(4)打印apache用户所在文本的行数以及整行内容

[root@localhost ~]$ awk -F: '/apache/{print NR,$0}' /etc/passwd
44 apache:x:48:48:Apache:/usr/share/httpd:/sbin/nologin

五、awk处理时机,可以安排额外的任务

  1. BEGIN{ }: 放在BEGIN花括号里的命令只会执行1次
  2. { }逐行任务: 放在花括号里的命令可执行n次,具体次数根据文本行数决定
  3. END{ }: 放在END花括号里的命令只会执行1次
  • 举例:
  • 文本内容如下:
cat ys.txt
keli     8       1.27    200
shenli   18      1.56    50000
zhongli  6000    1.86    0
kaiya    30      1.86    30000
kog      500     1.62    20000

(1)给文本内容打印列名name\tage\theight\tmora,命令中“\t”表示空格,放在BEGIN{ }里只会打印一行,并且优先执行BEGIN的命令

[root@localhost ~]$ awk  'BEGIN{print "name\tage\theight\tmora"}{print $0}' ys.txt
name    age     height  mora
shenli  18      1.56    50000
shenli  18      1.56    50000
kaiya   30      1.86    30000
kaiya   30      1.86    30000
kaiya   30      1.86    30000

(2)放在{ }逐行任务里的“=”号,会打印5次,因为文本里一共有5行,最后执行END{ }里的命令,END和BEGIN一样,只会执行一次命令,打印一行内容。

[root@localhost ~]$ awk 'BEGIN{print "name\tage\theight\tmora"}{print "==============================="}{print $0} END {print "==============END=============="}' ys.txt
name    age     height  mora
===============================
shenli  18      1.56    50000
===============================
shenli  18      1.56    50000
===============================
kaiya   30      1.86    30000
===============================
kaiya   30      1.86    30000
===============================
kaiya   30      1.86    30000
==============END==============

(3)把原来放在{ }逐行任务里的“=”号放在BEGIN{ }里可以明显看出区别,只打印了一行内容,“\n”表示回车符。

[root@localhost ~]$ awk  'BEGIN{print "=============BEGIN=============\nname\tage\theight\tmora"}{print $0} END {print "==============END=============="}' ys.txt
=============BEGIN=============
name    age     height  mora
shenli  18      1.56    50000
shenli  18      1.56    50000
kaiya   30      1.86    30000
kaiya   30      1.86    30000
kaiya   30      1.86    30000
==============END==============

(4)想要计算文本中某一列数值的和可以在{ }逐行任务里定义变量“mora”+=需要计算的列,最后再打印变量mora即可得到第四列数值的和,注意变量mora前面不需要加$号,不能放在双引号里面。

[root@localhost ~]$ awk 'BEGIN{print "=============BEGIN=============\nname\tage\theight\tmora"} {mora+=$4}{print $0} END {print "sum\t\t\t"mora"\n==============END=============="}' ys.txt
=============BEGIN=============
name    age     height  mora
shenli  18      1.56    50000
shenli  18      1.56    50000
kaiya   30      1.86    30000
kaiya   30      1.86    30000
kaiya   30      1.86    30000
sum                     190000
==============END==============

六、使用正则

  • ~包含 !~不包含
  • 举例:
  • 文本内容如下:
[root@localhost ~]$ cat passwd.txt
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
ntp:x:38:38::/etc/ntp:/sbin/nologin
gdm:x:42:42::/var/lib/gdm:/sbin/nologin
gnome-initial-setup:x:989:983::/run/gnome-initial-setup/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
avahi:x:70:70:Avahi mDNS/DNS-SD Stack:/var/run/avahi-daemon:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
tcpdump:x:72:72::/:/sbin/nologin
lrh:x:1000:1000:lrh:/home/lrh:/bin/bash
apache:x:48:48:Apache:/usr/share/httpd:/sbin/nologin

(1)输出有bin的行,无论bin在第几列

[root@localhost ~]$ awk -F: '/bin/{print}' passwd.txt
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
ntp:x:38:38::/etc/ntp:/sbin/nologin
gdm:x:42:42::/var/lib/gdm:/sbin/nologin
gnome-initial-setup:x:989:983::/run/gnome-initial-setup/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
avahi:x:70:70:Avahi mDNS/DNS-SD Stack:/var/run/avahi-daemon:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
tcpdump:x:72:72::/:/sbin/nologin
lrh:x:1000:1000:lrh:/home/lrh:/bin/bash
apache:x:48:48:Apache:/usr/share/httpd:/sbin/nologin

(2)输出第6列包含bin的行

[root@localhost ~]$ awk -F: '$6~/bin/{print}' passwd.txt
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin

(3)输出第6列不包含bin的行

[root@localhost ~]$ awk -F: '$6!~/bin/{print}' passwd.txt
root:x:0:0:root:/root:/bin/bash
ntp:x:38:38::/etc/ntp:/sbin/nologin
gdm:x:42:42::/var/lib/gdm:/sbin/nologin
gnome-initial-setup:x:989:983::/run/gnome-initial-setup/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
avahi:x:70:70:Avahi mDNS/DNS-SD Stack:/var/run/avahi-daemon:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
tcpdump:x:72:72::/:/sbin/nologin
lrh:x:1000:1000:lrh:/home/lrh:/bin/bash
apache:x:48:48:Apache:/usr/share/httpd:/sbin/nologin

(4)使用awk时如果指令就是{print},且前面写了条件的情况下可以省略{print}不写

[root@localhost ~]$ awk -F: '/bin/' passwd.txt
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
ntp:x:38:38::/etc/ntp:/sbin/nologin
gdm:x:42:42::/var/lib/gdm:/sbin/nologin
gnome-initial-setup:x:989:983::/run/gnome-initial-setup/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
avahi:x:70:70:Avahi mDNS/DNS-SD Stack:/var/run/avahi-daemon:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
tcpdump:x:72:72::/:/sbin/nologin
lrh:x:1000:1000:lrh:/home/lrh:/bin/bash
apache:x:48:48:Apache:/usr/share/httpd:/sbin/nologin

(5)输出第6列包含bin的行,省略{print}

[root@localhost ~]$ awk -F: '$6~/bin/' passwd.txt
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin

七、使用数字或者字符串

  • 相等 !=不相等 >= 大于等于 > 大于 <=小于等于 <小于
  • 举例:
  • 文本内容如下:
[root@localhost ~]$ cat passwd.txt
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
ntp:x:38:38::/etc/ntp:/sbin/nologin
gdm:x:42:42::/var/lib/gdm:/sbin/nologin
gnome-initial-setup:x:989:983::/run/gnome-initial-setup/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
avahi:x:70:70:Avahi mDNS/DNS-SD Stack:/var/run/avahi-daemon:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
tcpdump:x:72:72::/:/sbin/nologin
lrh:x:1000:1000:lrh:/home/lrh:/bin/bash
apache:x:48:48:Apache:/usr/share/httpd:/sbin/nologin

(1)输出第3行

[root@localhost ~]$ awk -F: 'NR==3' passwd.txt
daemon:x:2:2:daemon:/sbin:/sbin/nologin

(2)输出第1~2行

[root@localhost ~]$ awk -F: 'NR<3' passwd.txt
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin

(3)找uid是0~9的用户信息

[root@localhost ~]$ awk -F: '$3<10' passwd.txt
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin

(4)找普通用户

[root@localhost ~]$ awk -F: '$3>=1000' passwd.txt
lrh:x:1000:1000:lrh:/home/lrh:/bin/bash

(5)找解释器是/bin/bash的用户

[root@localhost ~]$ awk -F: '$7=="/bin/bash"' passwd.txt
root:x:0:0:root:/root:/bin/bash
lrh:x:1000:1000:lrh:/home/lrh:/bin/bash

八、逻辑组合

  • &&并且 || 或者
  • 举例:
  • 文本内容如下:
[root@localhost ~]$ cat passwd.txt
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
ntp:x:38:38::/etc/ntp:/sbin/nologin
gdm:x:42:42::/var/lib/gdm:/sbin/nologin
gnome-initial-setup:x:989:983::/run/gnome-initial-setup/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
avahi:x:70:70:Avahi mDNS/DNS-SD Stack:/var/run/avahi-daemon:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
tcpdump:x:72:72::/:/sbin/nologin
lrh:x:1000:1000:lrh:/home/lrh:/bin/bash
apache:x:48:48:Apache:/usr/share/httpd:/sbin/nologin

(1)找uid范围是10~40的行

[root@localhost ~]$ awk -F: '$3>=10&&$3<=40' passwd.txt
ntp:x:38:38::/etc/ntp:/sbin/nologin

(2)找2~10行

[root@localhost ~]$ awk -F: 'NR>=2&&NR<=10' passwd.txt
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
ntp:x:38:38::/etc/ntp:/sbin/nologin
gdm:x:42:42::/var/lib/gdm:/sbin/nologin
gnome-initial-setup:x:989:983::/run/gnome-initial-setup/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
avahi:x:70:70:Avahi mDNS/DNS-SD Stack:/var/run/avahi-daemon:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
tcpdump:x:72:72::/:/sbin/nologin

(3)找uid是0~4或者1001以上的行

[root@localhost ~]$ awk -F: '$3<5||$3>1001'  passwd.txt
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin

九、使用数组统计某一列相同的行数

  • 举例:
  • 文本内容如下:
[root@localhost ~]$ cat ys.txt
shenli  18      1.56    50000
shenli  18      1.56    50000
kaiya   30      1.86    30000
kaiya   30      1.86    30000
kaiya   30      1.86    30000

(1)统计文本第一列名字相同行数,使用数值name[$1]可以存储多个变量,name[$1]++表示把第一列重复的名字进行加1计算并存储到数组name[$1]下标,最后使用END任务输出name[shenli]和name[kaiya]的值是 2和 3,但这种方法不能同时显示name的值及name[$1]下标重复的次数

[root@localhost ~]$ awk '{name[$1]++}END{print name["shenli"],name["kaiya"]}' ys.txt
2 3

十、使用数组和for循环实现高级搜索

  • 举例:
  • 文本内容如下:
[root@localhost ~]$ cat /var/log/httpd/access_log
127.0.0.1 - - [03/Jul/2022:21:12:57 +0800] "GET / HTTP/1.1" 403 4897 "-" "curl/7.29.0"
127.0.0.1 - - [03/Jul/2022:21:13:04 +0800] "GET / HTTP/1.1" 403 4897 "-" "curl/7.29.0"
127.0.0.1 - - [03/Jul/2022:21:13:06 +0800] "GET / HTTP/1.1" 403 4897 "-" "curl/7.29.0"
127.0.0.1 - - [03/Jul/2022:21:13:06 +0800] "GET / HTTP/1.1" 403 4897 "-" "curl/7.29.0"
192.168.4.1 - - [03/Jul/2022:21:15:13 +0800] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"
192.168.4.1 - - [03/Jul/2022:21:15:14 +0800] "GET / HTTP/1.1" 403 4897 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"
192.168.4.1 - - [03/Jul/2022:21:15:14 +0800] "GET /noindex/css/bootstrap.min.css HTTP/1.1" 200 19341 "http://192.168.4.8/" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"
192.168.4.1 - - [03/Jul/2022:21:15:14 +0800] "GET /noindex/css/fonts/Light/OpenSans-Light.ttf HTTP/1.1" 404 240 "http://192.168.4.8/noindex/css/open-sans.css" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"
192.168.4.1 - - [03/Jul/2022:21:15:14 +0800] "GET /favicon.ico HTTP/1.1" 404 209 "http://192.168.4.8/" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"
192.168.4.20 - - [03/Jul/2022:21:20:03 +0800] "GET / HTTP/1.1" 403 4897 "-" "curl/7.61.1"
192.168.4.20 - - [03/Jul/2022:21:20:04 +0800] "GET / HTTP/1.1" 403 4897 "-" "curl/7.61.1"
192.168.4.20 - - [03/Jul/2022:21:20:05 +0800] "GET / HTTP/1.1" 403 4897 "-" "curl/7.61.1"
192.168.4.20 - - [03/Jul/2022:21:20:05 +0800] "GET / HTTP/1.1" 403 4897 "-" "curl/7.61.1"
192.168.4.20 - - [03/Jul/2022:21:20:05 +0800] "GET / HTTP/1.1" 403 4897 "-" "curl/7.61.1"
192.168.4.20 - - [03/Jul/2022:21:20:06 +0800] "GET / HTTP/1.1" 403 4897 "-" "curl/7.61.1"
192.168.4.20 - - [03/Jul/2022:21:20:06 +0800] "GET / HTTP/1.1" 403 4897 "-" "curl/7.61.1"

(1)使用for循环,循环显示数组ip的值,与下标,其中for(i in ip)里面的i是变量,代表下标,in是语法不能变,ip[$1]++是把第一列重复的ip进行加1计算并存储到数组ip[$1]下标,ip是数组名,ip[i]下标可以存储多个不同的变量,最后使用END任务循环输出数组ip的值,与下标。

[root@localhost ~]$ awk '{ip[$1]++}END{for(i in ip){print i,ip[i] } }'  /var/log/httpd/access_log
127.0.0.1 4
192.168.4.1 5
192.168.4.20 7

十一、awk使用if判断

(1)定义变量num=21,如果21除以2等于0,则21是偶数,否则是奇数,printf %d是输出整数,默认右对齐,%d对应双引号外的num变量,printf不会输出内容后自动回车,因此需要加上\n。

[root@localhost ~]$ awk 'BEGIN{num=21; if(num % 2 == 0)printf "%d 偶数\n",num;else printf "%d 奇数\n",num}'
21 奇数
  • 0
    点赞
  • 0
    收藏
  • 打赏
    打赏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
©️2022 CSDN 皮肤主题:游动-白 设计师:我叫白小胖 返回首页
评论 1

打赏作者

Runhao.luo

你的鼓励将是我创作的最大动力

¥2 ¥4 ¥6 ¥10 ¥20
输入1-500的整数
余额支付 (余额:-- )
扫码支付
扫码支付:¥2
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值