awk命令笔记

抓饼先生

已于 2023-10-17 16:03:49 修改

阅读量552

点赞数

文章标签： linux ubuntu bash

于 2023-06-06 14:18:38 首次发布

本文链接：https://blog.csdn.net/yinminsumeng/article/details/130618023

版权

命令语法

awk 'BEGIN { commands } PATTERN { commands } END { commands }'
     begin块               body块                 end块

begin/end区分大小写，大写有效。
空格可选。
内置变量需大写。

执行流程

1）先执行begin块。
相当于循环初始化。
2）对每条输入记录，执行body块。
相当于执行循环体。
如果是处理多行文本，默认用换行符拆分记录，即循环每行进行处理。
3）对每条记录（每行文本）的默认处理：
用分隔符（默认是空格）分隔每行中的字段，分别赋值给$1, $2 … （$0代表整行）
根据需要对各个字段进行处理，输出等。
4）循环结束后，执行end块。

pattern模式

1）/正则表达式/：/some string/
2）关系表达式：$2>10 NR%2==0
3）模式匹配表达式： ~ ~!
4）范围区间：
'NR==1, NR==10' 1-10行

Example 1

$ cat /tmp/test.log 
2023-05-18 05:08:56.965846   460  4936 I SOMEIP  : 2023-05-18 05:08:56.871641 [info] [TID:571]  repetitionsBaseDelay_ = 100
2023-05-18 05:08:56.965863   460  4936 I SOMEIP  : 2023-05-18 05:08:56.871645 [info] [TID:571]  repetitionsMax_ = 3
2023-05-18 05:08:30.896556  1538  1538 I SettingsIFImpl: line 1
2023-05-18 05:08:30.896556  1538  1538 I SettingsIFImpl: line 2

# 包含SOMEIP的行
$ awk '/SOMEIP/' /tmp/test.log
2023-05-18 05:08:56.965846   460  4936 I SOMEIP  : 2023-05-18 05:08:56.871641 [info] [TID:571]  repetitionsBaseDelay_ = 100
2023-05-18 05:08:56.965863   460  4936 I SOMEIP  : 2023-05-18 05:08:56.871645 [info] [TID:571]  repetitionsMax_ = 3

# 不包含SOMEIP的行
$ awk '!/SOMEIP/' /tmp/test.log
2023-05-18 05:08:30.896556  1538  1538 I SettingsIFImpl: line 1
2023-05-18 05:08:30.896556  1538  1538 I SettingsIFImpl: line 2

# 把字段$6匹配模式的行，打印每行中$6后续内容
$ awk 'BEGIN{ ORS=" " } $6~/SOMEIP/{ for(i=8; i<=NF; i++) { print $i } print "\n" }' /tmp/test.log
2023-05-18 05:08:56.871641 [info] [TID:571]  repetitionsBaseDelay_ = 100 
 2023-05-18 05:08:56.871645 [info] [TID:571]  repetitionsMax_ = 3 
 2023-05-18 05:08:56.871641 [info] [TID:571]  repetitionsBaseDelay_ = 100 
 2023-05-18 05:08:56.871645 [info] [TID:571]  repetitionsMax_ = 3

Example 2

# 文本内容，执行top命令，输出前5个进程信息
$ cat /data/misc/top_5.txt
Threads: 3211 total,   5 running, 3192 sleeping,   0 stopped,  14 zombie
  Mem: 11631808K total, 11530220K used,   101588K free,   240000K buffers
 Swap:  4194300K total,    262144 used,  4194044K free,  4813496K cached
800%cpu 111%user  47%nice 325%sys 286%idle   0%iow  21%irq  11%sirq   0%host
  TID USER         PR  NI VIRT  RES  SHR S[%CPU] %MEM     TIME+ THREAD          PROCESS
  286 logd         30  10  12G 221M 2.6M R 80.8   1.9 860:21.19 logd.writer     logd
31445 root         20   0  12G 6.4M 3.2M R 76.7   0.0   0:00.43 toybox          toybox
 2459 u10_a58      20   0  16G 119M  78M R 71.2   1.0 691:17.97 hmessageservice pushmessageservice
 4986 root         20   0  12G 7.6M 4.7M S 32.8   0.0 102:27.51 UsbFfs-worker   adbd
  547 system       -3  -8  12G  54M  39M D 27.3   0.4 250:51.13 surfaceflinger  surfaceflinger

# 为进程对应的行，添加行号，$1~/[0-9]$/匹配数字（pid）开头的行
$ cat top_5.txt | awk 'BEGIN { line_num=1 } $1~/[0-9]$/{ print line_num++ ": " $0 }'
1:   286 logd         30  10  12G 221M 2.6M R 80.8   1.9 860:21.19 logd.writer     logd
2: 31445 root         20   0  12G 6.4M 3.2M R 76.7   0.0   0:00.43 toybox          toybox
3:  2459 u10_a58      20   0  16G 119M  78M R 71.2   1.0 691:17.97 hmessageservice pushmessageservice
4:  4986 root         20   0  12G 7.6M 4.7M S 32.8   0.0 102:27.51 UsbFfs-worker   adbd
5:   547 system       -3  -8  12G  54M  39M D 27.3   0.4 250:51.13 surfaceflinger  surfaceflinger

# 下面3行命令效果相同，使用正则表达式的方式不同。
# 多个模式匹配+命令的方式，$1~/[0-9]匹配模式的添加行号显示，$1!~/[0-9]$/不匹配的打印原内容
$ cat top_5.txt | awk 'BEGIN { line_num=1 } $1~/[0-9]$/{ print line_num++ ": " $0 } $1!~/[0-9]$/{ print $0 } '
# 使用if-else进行逻辑控制
$ cat top_5.txt | awk 'BEGIN { line_num=1 } { if ($1~/[0-9]$/) print line_num++ ": " $0;  else print $0; } '
# 使用match函数查找匹配，并结合if-else
$ cat top_5.txt | awk 'BEGIN { line_num=1 } { if (match($1, /[0-9]$/) != 0) print line_num++ ": " $0;  else print $0; } '
Threads: 3211 total,   5 running, 3192 sleeping,   0 stopped,  14 zombie
  Mem: 11631808K total, 11530220K used,   101588K free,   240000K buffers
 Swap:  4194300K total,    262144 used,  4194044K free,  4813496K cached
800%cpu 111%user  47%nice 325%sys 286%idle   0%iow  21%irq  11%sirq   0%host
  TID USER         PR  NI VIRT  RES  SHR S[%CPU] %MEM     TIME+ THREAD          PROCESS
1:   286 logd         30  10  12G 221M 2.6M R 80.8   1.9 860:21.19 logd.writer     logd
2: 31445 root         20   0  12G 6.4M 3.2M R 76.7   0.0   0:00.43 toybox          toybox
3:  2459 u10_a58      20   0  16G 119M  78M R 71.2   1.0 691:17.97 hmessageservice pushmessageservice
4:  4986 root         20   0  12G 7.6M 4.7M S 32.8   0.0 102:27.51 UsbFfs-worker   adbd
5:   547 system       -3  -8  12G  54M  39M D 27.3   0.4 250:51.13 surfaceflinger  surfaceflinger

内置变量

FILENAME: 当前文件名
NR: 表示所有处理文件已处理的输入记录个数
FNR: 文件的当前记录数
NF: 表示数据文件中数据字段的个数，可以通过$NF获取最后一个数据字段
ARGC: 命令行参数个数
ARGV: 命令行参数数组
$0: 这个变量包含执行过程中当前行的文本内容。
$n: 一行记录的第n个字段，例如$1, $2

FS：输入字段分隔符
OFS：输出字段分隔符
RS：输入记录分割符
ORS：输出字段分隔符
FIELDWIDTHS：定义数据字段的宽度

$ awk '{print FILENAME, NF, $0}' /tmp/test.log 
/tmp/test.log 8 2023-05-18 05:08:30.896556  1538  1538 I SettingsIFImpl: line 1
/tmp/test.log 8 2023-05-18 05:08:30.896556  1538  1538 I SettingsIFImpl: line 2

$ awk '{print FILENAME "line>>" NR, $0}' /tmp/test.log 
/tmp/test.logline>>1 2023-05-18 05:08:30.896556  1538  1538 I SettingsIFImpl: line 1
/tmp/test.logline>>2 2023-05-18 05:08:30.896556  1538  1538 I SettingsIFImpl: line 2

body块

/pattern/ { commands }

1）匹配正则表达式pattern的记录，执行命令commands。
2）可以有多个匹配模式，例如：
（1）当匹配pattern1时，执行commands1。如果匹配pattern2就执行commands2.
（2）next作用就是跳过后面的模式匹配和命令（类似if/else的关系）；如果没有next命令，则每个模式匹配都会进行判断。

/patten1/ {commands1; next} /pattern2/ {commands2}

3）如果省略{commands}，默认执行print $0。只执行/pattern/匹配过滤。
（1）对于简单字符串，退化成grep。
（2）利用内置变量和表达式做复杂查找（筛选），例如筛选出偶数行。

$ awk '/Max/' /tmp/test.log 
$ awk 'NR%2==0' /tmp/test.log

4）print函数在commands中可以重定向输出到文件或者管道

awk 'NR%2==0 { print $1 > "/tmp/part.log" }' /tmp/test.log

命令块

1）可编写多条语句。每行一个命令，结尾不必有分号。也可以分号分隔的多条命令。
2）支持if-else，for，while等控制结构。
3）支持数据，索引可用数字或字符串，相当于map。
4）可自定义函数
function find_min(num1, num2) { if (num1 < num2) return num1 return num2 }

内置函数

（1）数学函数：sin, cos, log, sqrt, int, rand
（2）字符串函数：gsub, sub, substr, index, length, match, split, tolower, toupper, sprintf, strtonum

	sub(reg, str [, target]) 
	a) 匹配reg的字符串，替换为str。
	b) target为替换目标字符串，默认为$0，可指定为某个表里或者字段$n。
	c) gsub和sub原型一样，替换所有匹配reg的字符串，sub只替换第一个出现。

	gensub(reg, str, h [, target])
	a) h可指定替换第几个出现的reg，或者“g/G”标识替换全部。
	b) 在str中可以通过“\n”引用reg出现的位置。

	print / printf
	print函数类似shell中的echo命令，每次输出自动换行。
	printf函数和c语言中的printf类似，格式化输出，如果要换行需要手动添加'\n'。

$ cat testfile 
line1
line2
line3

$ awk '{ print "["$0"]" }' testfile 
[line1]
[line2]
[line3]

$ awk '{ printf "[%s]\n", $0 }' testfile 
[line1]
[line2]
[line3]

match函数
函数原型如下。
2个参数版本，在str中匹配模式re，类似正则匹配：str ~ re。返回值为模式re在str中的位置（数值）。
3个参数版本，re中需有小括号子模式，array[n]为匹配到的字符串（re可能是正则表达式），array[n, "start"]和array[n, "length"]分别是匹配项开始位置和长度。

match(str, re [, array])

$ echo 123 | awk '{ print match("aabaacaad", "(aa)", res); printf "result: %s, start: %d, length: %d\n", res[1], res[1, "start"], res[1, "length"] }'
1
result: aa, start: 1, length: 2

$ echo 123 | awk '{ print match("aa{b}aa{c}aad", "{(.)}", res); printf "result: %s, start: %d, length: %d\n", res[1], res[1, "start"], res[1, "length"] }'
3
result: b, start: 4, length: 1

$ echo 123 | awk '{ print match("aa{b}aa{c}aad", /{(.)}/, res); printf "result: %s, start: %d, length: %d\n", res[1], res[1, "start"], res[1, "length"] }'
3
result: b, start: 4, length: 1

#下面awk脚本中的命令
match($0, /.*tcontext=u:r:(.+):s0.*/, target_ctx);

（3）时间函数：mktime, strftime, systime
（4）位操作函数：and, or, xor, compl, lshift, rshift
（5）其他函数：close, flush, exit, delete, getline, next, nextfile, return system

# sub，gsub例子
# sub
$ cat /tmp/test.log 
2023-05-18 05:08:56.965846   460  4936 I SOMEIP  : 2023-05-18 05:08:56.871641 [info] [TID:571] client_timer_18215_1 repetitionsBaseDelay_ = 100
2023-05-18 05:08:56.965863   460  4936 I SOMEIP  : 2023-05-18 05:08:56.871645 [info] [TID:571] client_timer_18215_1 repetitionsMax_ = 3

#sub函数只替换第一次出现的字符串
$ awk '{ sub(/2023-05-18/, "1234-56-78"); print $0 }' /tmp/test.log 
1234-56-78 05:08:56.965846   460  4936 I SOMEIP  : 2023-05-18 05:08:56.871641 [info] [TID:571] client_timer_18215_1 repetitionsBaseDelay_ = 100
1234-56-78 05:08:56.965863   460  4936 I SOMEIP  : 2023-05-18 05:08:56.871645 [info] [TID:571] client_timer_18215_1 repetitionsMax_ = 3

#gsub函数替换一行中所有匹配的字符串
$ awk '{ gsub(/2023-05-18/, "1234-56-78"); print $0 }' /tmp/test.log 
1234-56-78 05:08:56.965846   460  4936 I SOMEIP  : 1234-56-78 05:08:56.871641 [info] [TID:571] client_timer_18215_1 repetitionsBaseDelay_ = 100
1234-56-78 05:08:56.965863   460  4936 I SOMEIP  : 1234-56-78 05:08:56.871645 [info] [TID:571] client_timer_18215_1 repetitionsMax_ = 3

# gsub删除收尾空格
$ awk '{gsub(/^ +| +$/,"")} {print "=" $0 "="}' onefile.txt

用多个分隔符拆分记录

$ cat test.log
2023-05-18 05:08:56.965846   460  4936 I SOMEIP

# 用空格、.、：多个分隔符来拆分记录，默认只用空格分隔符
$ awk -F "[ .:]" '{print $1,$2,$3,$4,$5}' test.log 
2023-05-18 05 08 56 965846

# FS和-F参数等效
$ awk 'BEGIN{FS="[ .:]"} {print $1,$2,$3,$4,$5}' test.log 
2023-05-18 05 08 56 965846

定义和引用shell变量

1）对脚本比较有用，命令行中用处不大。
2）在命令行中，自定义变量写在脚本指令之后。脚本指令紧挨着awk。

#在命令行中定义变量
$ awk '{ print name"="age }' name=tom age=12 /tmp/test.log 
tom=12
tom=12

# 下面3行功能相同
# 重点：正则表达式匹配模式：$n~/pattern/，如果模式是变量，则：$n~var
sig_list="1 2 3 4"
for sig in $sig_list
do
	# 1) 用-v参数引入shell变量，双引号可省略，2) 把每行中要打印的数据保存到out_buf中，避免每次print时使用内置ORS, 3) $3~sig_id匹配变量中引入的模式
    awk -v sig_id="$sig" '$3~sig_id{ out_buf=""; for(i=3; i<15; i++) { out_buf = out_buf " " $i } print out_buf; }' *.asc | sort | uniq
	# 和上面的差别，没有用buf保存打印的字段，通过修改内置变量ORS（默认换行）改变输出格式，前面设置为空格，后面清掉，否则会在行首打印一个空格。
    awk -v sig_id="$sig" '$3~sig_id{ ORS=" "; for(i=3; i<15; i++) { print $i } ORS=""; print "\n" }' *.asc | sort | uniq
	# 换一种方式引入shell变量，在命令行awk脚本后面直接定义。
    awk '$3~sig_id{ ORS=" "; for(i=3; i<15; i++) { print $i } ORS=""; print "\n" }' sig_id=$sig *.asc | sort | uniq

    echo -e "-- $sig end   ---------------------------\n"
done

数组

1）关联数组，map类型。
2）不用先定义
3）for…in循环可能无序，for (i=1…)循环有序

$ awk '
BEGIN {
str="this is a string"; 
len=split(str, array, " "); 
print length(array), len; 
for (i in array) 
	print i": "array[i]; 
}'

4 4
1: this
2: is
3: a
4: string

$ awk 
'BEGIN {
str="this is a string"; 
len=split(str, array, " "); 
print length(array), len; 
for (i=1; i<=len; i++) 
	print i": "array[i]; 
}'
4 4
1: this
2: is
3: a
4: string

$ awk 
'BEGIN{ 
arr["one"]=1; 
arr["two"]=2; 
arr["three"]=3; 

for (item in arr) 
	print item"->"arr[item] 
}'
three->3
two->2
one->1

流程控制

$ cat /tmp/file.txt 
line 1
line 2
line 3
line 4
line 5
line 6

# if
$ awk '{ 
if (NR % 2 == 0) 
{
	print $0
} 
else if (NR %3 == 0) 
{
	print $0
} 
}' /tmp/file.txt 

line 2
line 3
line 4
line 6

# while 循环
$ awk 'BEGIN{ 
count=3; 
while (count>0) 
{
	print count; 
	count--;
} 
}' /tmp/file.txt 
3
2
1

# for循环
$ awk 'BEGIN{ 
for(count=3; count>0; count--) 
{
	print count;
} 
}' /tmp/file.txt 
3
2
1

控制命令

break，退出while/for循环
continue，继续下一次循环
next，继续系一条记录，把body块作为循环体，next类似continue，跳到下一条记录。
exit, 在body块中exit，结束body块循环，执行END；在END中exit，退出程序。包body块作为循环体，exit类似break。

数值计算例子


$ cat /tmp/num.txt
1
2
3
4
5
# 计算均值
$ awk 'BEGIN{ sum=0; } { sum+=$1; } END{ print sum/NR }' /tmp/num.txt 
3

$ cat /tmp/num.txt
1 2 3 4 5
11 22 33 44 55
10 20 30 40 50
# 计算均值
$ awk '{ sum=0; for(i=1; i<=NF; i++) sum+=$i; print sum/NF }' /tmp/num.txt 
3
33
30

自定义函数

function find_min(num1, num2)
{
  if (num1 < num2)
    return num1
  return num2
}

使用脚本文件

这是一个处理Android上SeLinux报错信息的例子。根据log中的信息，拼凑出对应的sepolicy规则，可直接放到对应的.te文件中。
1）脚本文件中不需要用单引号。
2）awk -f指定脚本文件即可。

# 以下面这行log为例，说明脚本功能
# 06-09 08:01:28.747 28112 28112 I toybox  : type=1400 audit(0.0:497657): avc: denied { read } for scontext=u:r:YourProcess:s0 tcontext=u:r:logd:s0 tclass=file permissive=1

# convert-selinux.awk
{
    match($0, /.*{ (.+) }.*/, perm);
    match($0, /.*tcontext=u:r:(.+):s0.*/, target_ctx);
    match($0, /.*scontext=u:r:(.+):s0 tcontext.*/, source_ctx);
    match($0, /.*tclass=(.+) permissive=.*/, class);

    if (target_ctx[1] == "") {
        match($0, /.*tcontext=u:object_r:(.+):s0.*/, target_ctx);
    }

    if (target_ctx[1] == "" \
        && source_ctx[1] == "" \
        && class[1] == "" \
        && perm[1] == "") {
       print "Error: Can not parse the record: "
       print $0

       next
    }

    # print "debug: " source_ctx[1] ", " target_ctx[1] ", " class[1] ", " perm[1]

    # e.g. allow YourProcess init:dir { r_dir_perms };
    if (class[1] == "file") {
        print "allow " source_ctx[1] " " target_ctx[1] ":" class[1] " { r_file_perms };";
    } else if (class[1] == "dir") {
        print "allow " source_ctx[1] " " target_ctx[1] ":" class[1] " { r_dir_perms };";
    } else {
        print "Error: Can not handle class: " class[1]
        print $0
    }

}

# 日志文件内容
$ cat se-log2.txt 
06-09 08:01:23.275 28105 28105 I toybox  : type=1400 audit(0.0:497247): avc: denied { getattr } for path="/proc/2" dev="proc" ino=1520277 scontext=u:r:YourProcess:s0 tcontext=u:r:kernel:s0 tclass=dir permissive=1
06-09 08:01:23.283 28105 28105 I toybox  : type=1400 audit(0.0:497249): avc: denied { read } for scontext=u:r:YourProcess:s0 tcontext=u:r:vendor_init:s0 tclass=file permissive=1
06-09 08:01:28.731 28112 28112 I toybox  : type=1400 audit(0.0:497655): avc: denied { getattr } for path="/proc/1" dev="proc" ino=14304 scontext=u:r:YourProcess:s0 tcontext=u:r:init:s0 tclass=dir permissive=1
06-09 08:01:28.747 28112 28112 I toybox  : type=1400 audit(0.0:497657): avc: denied { read } for scontext=u:r:YourProcess:s0 tcontext=u:r:logd:s0 tclass=file permissive=1
06-09 08:01:28.747 28112 28112 I toybox  : type=1400 audit(0.0:497658): avc: denied { read } for scontext=u:r:YourProcess:s0 tcontext=u:r:lmkd:s0 tclass=file permissive=1
06-09 08:01:28.747 28112 28112 I toybox  : type=1400 audit(0.0:497659): avc: denied { read } for scontext=u:r:YourProcess:s0 tcontext=u:r:servicemanager:s0 tclass=file permissive=1

# 通过脚本处理日志文件，生成sepolicy rule
# 也可以通过管道动态处理selinux的log
$ awk -f filter-se.awk se-log2.txt 
allow YourProcess kernel:dir { r_dir_perms };
allow YourProcess vendor_init:file { r_file_perms };
allow YourProcess init:dir { r_dir_perms };
allow YourProcess logd:file { r_file_perms };
allow YourProcess lmkd:file { r_file_perms };
allow YourProcess servicemanager:file { r_file_perms };

抓饼先生

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
awk命令笔记

（2）字符串函数：gsub, sub, substr, index, length, match, split, tolower, toupper, sprintf, strtonum。（5）其他函数：close, flush, exit, delete, getline, next, nextfile, return system。（4）位操作函数：and, or, xor, compl, lshift, rshift。（1）数学函数：sin, cos, log, sqrt, int, rand。
复制链接

扫一扫