Linux高级文本处理之gawk printf命令与函数（六）

最新推荐文章于 2021-05-14 03:11:43 发布

weixin_33953384

最新推荐文章于 2021-05-14 03:11:43 发布

阅读量283

点赞数

原文链接：http://blog.51cto.com/yolynn/1893508

版权

一、使用printf格式化输出

printf 可以非常灵活、简单地以你期望的格式输出结果。

语法:

printf "print format", variable1,variable2,etc.

printf 中的特殊字符：

printf 不会使用 OFS 和 ORS，它只根据”format”里面的格式打印数据。

printf 格式化字符：

实例1：

[root@localhost ~]# cat pri.awk 
BEGIN {
    printf "s--> %s\n", "String"
    printf "c--> %c\n", "String"
    printf "s--> %s\n", 101.23
    printf "d--> %d\n", 101,23
    printf "e--> %e\n", 101,23
    printf "f--> %f\n", 101,23
    printf "g--> %g\n", 101,23
    printf "o--> %o\n", 0x8
    printf "x--> %x\n", 16
    printf "percentage--> %%\n", 17
}
[root@localhost ~]# awk -f pri.awk 
s--> String
c--> S
s--> 101.23
d--> 101
e--> 1.010000e+02
f--> 101.000000
g--> 101
o--> 10
x--> 10
percentage--> %

printf中修饰字符：

修饰符：#[.#] 第一个数字控制显示的宽度；第二个#表示小数点后精度

– 左对齐（默认右对齐）%-15s

+ 显示数值的正负符号 %+d，0也会添加正号

$ 如果要在价钱之前加上美元符号,只需在格式化字符串之前(%之前)加上$即可

0 左边补 0 (而不是空格),在指定宽度的数字前面加一个 0，例如使用"%05s"代替"%5s"

实例2：

[root@localhost ~]# awk 'BEGIN { printf "|%6s%7.3f|\n", "Good","2.1" }'  
|  Good  2.100|
[root@localhost ~]# awk 'BEGIN { printf "|%-6s%-7.3f|\n", "Good","2.1" }'
|Good  2.100  |

把结果重定向到文件:

Awk 中可以把 print 语句打印的内容重定向到指定的文件中。

实例3：

[root@localhost ~]# awk 'BEGIN{a=5;printf "%3d\n",a> "report.txt"}'
[root@localhost ~]# cat report.txt 
  5

另一种方法使用awk -f script.awk file > redirectfile

awk脚本执行方式：

实例4：

[root@localhost ~]# cat fz.awk      
#!/bin/awk -f
BEGIN {
FS=",";
OFS=",";
total1 = total2 = total3 = total4 = total5 = 10;
total1 += 5; print total1;
total2 -= 5; print total2;
total3 *= 5; print total3;
total4 /= 5; print total4;
total5 %= 5; print total5;
}
[root@localhost ~]# chmod +x fz.awk   
[root@localhost ~]# ./fz.awk        
15
5
50
2
0

二、awk内置函数与自定义函数

数值处理函数：

rand（）函数

rand()函数用于产生 0~1 之间的随机数，它只返回 0~1 之间的数，绝不会返回 0 或 1。这些数在 awk 运行时是随机的，但是在多次运行中，又是可预知的。

实例1：产生 1000 个随机数(0 到 100 之间)

[root@localhost ~]# cat occ.awk 
BEGIN {
    while(i<1000)
    {
        n = int(rand()*100);
        rnd[n]++;
        i++;
    }
    for(i=0;i<=100;i++)
    {
        print i,"Occured",rnd[i],"times";
    }
}
[root@localhost ~]# awk -f occ.awk 
0 Occured 11 times
1 Occured 8 times
2 Occured 9 times
3 Occured 15 times
4 Occured 16 times
5 Occured 5 times
6 Occured 8 times
7 Occured 9 times
8 Occured 7 times
9 Occured 7 times
10 Occured 11 times
11 Occured 7 times
12 Occured 10 times
13 Occured 9 times
14 Occured 6 times
15 Occured 18 times
16 Occured 10 times
17 Occured 10 times
18 Occured 9 times
19 Occured 8 times
20 Occured 11 times
21 Occured 13 times
22 Occured 10 times
23 Occured 9 times
24 Occured 15 times
25 Occured 8 times
26 Occured 3 times
27 Occured 17 times
28 Occured 9 times
29 Occured 13 times
30 Occured 11 times
31 Occured 9 times
32 Occured 12 times
33 Occured 12 times
34 Occured 9 times
35 Occured 6 times
36 Occured 13 times
37 Occured 15 times
38 Occured 6 times
39 Occured 9 times
40 Occured 7 times
41 Occured 8 times
42 Occured 6 times
43 Occured 8 times
44 Occured 10 times
45 Occured 7 times
46 Occured 10 times
47 Occured 8 times
48 Occured 16 times
49 Occured 12 times
50 Occured 6 times
51 Occured 15 times
52 Occured 6 times
53 Occured 12 times
54 Occured 8 times
55 Occured 13 times
56 Occured 6 times
57 Occured 16 times
58 Occured 5 times
59 Occured 7 times
60 Occured 11 times
61 Occured 12 times
62 Occured 14 times
63 Occured 11 times
64 Occured 9 times
65 Occured 6 times
66 Occured 7 times
67 Occured 10 times
68 Occured 8 times
69 Occured 12 times
70 Occured 13 times
71 Occured 9 times
72 Occured 10 times
73 Occured 11 times
74 Occured 7 times
75 Occured 13 times
76 Occured 13 times
77 Occured 10 times
78 Occured 5 times
79 Occured 12 times
80 Occured 17 times
81 Occured 8 times
82 Occured 7 times
83 Occured 10 times
84 Occured 12 times
85 Occured 12 times
86 Occured 11 times
87 Occured 14 times
88 Occured 4 times
89 Occured 8 times
90 Occured 15 times
91 Occured 10 times
92 Occured 15 times
93 Occured 8 times
94 Occured 11 times
95 Occured 5 times
96 Occured 12 times
97 Occured 11 times
98 Occured 7 times
99 Occured 11 times
100 Occured  times

注意：可见rand（）函数产生的随机数重复概率很高。

srand(n)函数

srand(n)函数使用给定的参数 n 作为种子来初始化随机数的产生过程。不论何时启动， awk 只会从 n 开始产生随机数，如果不指定参数 n， awk 默认使用当天的时间作为产生随机数的种子。

实例2：产生 5 个从 5 到 50 的随机数

[root@localhost ~]# cat srand.awk 
BEGIN {
    #Initialize the sedd with 5.
    srand(5);
    #Totally I want to generate 5 numbers
    total = 5;
    #maximun number is 50
    max = 50;
    count = 0;
    while(count < total)
    {
        rnd = int(rand()*max);
        if( array[rnd] == 0 )
        {
            count++;
            array[rnd]++;
        }
    }
    for ( i=5;i<=max;i++)
    {
        if (array[i])
            print i;}
    }
[root@localhost ~]# awk -f srand.awk 
14
16
23
33
35

常用字符串函数：

length函数：

length([S]) 返回指定字符串长度。也可取得数组长度。

实例1：length函数取字符串长度

[root@bash ~]# awk 'BEGIN{print length("young")}'
5

实例2：length取得数组长度

[root@young ~]# ss -tnl|awk '/LISTEN/{split($4,port,":"); print port[length(port)]}'
22
631
25
22
631
25

小贴士：grep用perl表达式也可实现该功能取端口号

[root@young ~]# ss -tnl|grep -Po ':\K\d+(?= )'
22
631
25
22
631
25

sub函数：

sub(r,s,[t]) 对t字符串进行搜索r表示的模式匹配的内容（可使用正则匹配），并将第一个匹配的内容替换为s代表的字符串。

实例1：

[root@bash ~]# awk 'BEGIN{a="geek young";sub("young","xixi",a);print a}' 
geek xixi  #注意字符串要用引号

实例2：

[root@bash ~]# echo "geek young hahahaha"|awk '
>{sub(/\<young\>/,"xixi",$2);  #正则匹配模式中字符串不加引号
>print $2}'   
xixi

实例3：

[root@bash ~]# echo "2008:08:08:08 08:08:08" | awk 'sub(/:/,"",$1)'
200808:08:08 08:08:08

实例4：

[root@bash ~]# cat sub.awk
BEGIN {
state="CA is California"
sub("C[Aa]","KA",state);
print state;
}
[root@bash ~]# awk -f sub.awk
KA is California

gsub函数：

gsub([r,s,[t]]) 对t字符串进行搜索r表示的模式匹配的内容（可使用正则匹配），并全部替换为s。

实例1：

[root@bash ~]# echo "2008:08:08:08 08:08:08" | awk 'gsub(/:/,"",$1)'
2008080808 08:08:08

split函数：

split(s,array,[r]) 以r为分割符切割字符s，并将切割后的结果存至array表示的数组中第一个索引值为1,第二个索引值为2,…。

实例1：

[root@bash ~]# echo "192.168.1.1:80"|awk '
>{split($1,ip,":");
>print ip[1],"----",ip[2]}'                       
192.168.1.1 ---- 80

实例2：

[root@bash ~]# netstat -tan | awk '
>/^tcp\>/{split($5,ip,":");
>count[ip[1]]++}  #将一个数组的值作为另一个数组的索引并自加通常用来计算重复次数
>END{for (i in count){print i,count[i]}}'
116.211.167.193 3
0.0.0.0 4
192.168.1.116 1

实例3：

[root@bash ~]# cat items-sold1.txt   
101:2,10,5,8,10,12
102:0,1,4,3,0,2
103:10,6,11,20,5,13
104:2,3,4,0,6,5
105:10,2,5,7,12,6
[root@bash ~]# cat split.awk
BEGIN {
FS=":"
} {
split($2,quantity,",");
total=0;
for(x in quantity)
total=total+quantity[x];
print "Item",$1,":",total,"quantities sold";
}
[root@bash ~]# awk -f split.awk items-sold1.txt
Item 101 : 47 quantities sold
Item 102 : 10 quantities sold
Item 103 : 65 quantities sold
Item 104 : 20 quantities sold
Item 105 : 42 quantities sold

substr 函数

语法：

substr(input-string,location,length)

substr 函数从字符串中提取指定的部分(子串)，上面语法中：
input-string:包含子串的字符串
location:子串的开始位置
length:从 location 开始起，出去的字符串的总长度。这个选项是可选的，如果不指
定长度，那么从 location 开始一直取到字符串的结尾

实例1：从字符串的第 5 个字符开始，取到字符串结尾并打印出来

[root@localhost ~]# cat items.txt 
101,HD Camcorder,Video,210,10
102,Refrigerator,Appliance,850,2
103,MP3 Player,Audio,270,15
104,Tennis Racket,Sports,190,20
105,Laser Printer,Office,475,5
[root@localhost ~]# awk '{ print substr($0,5) }' items.txt
HD Camcorder,Video,210,10
Refrigerator,Appliance,850,2
MP3 Player,Audio,270,15
Tennis Racket,Sports,190,20
Laser Printer,Office,475,5

实例2：从第 2 个字段的第 1 个字符起，打印 5 个字符

[root@localhost ~]# awk -F"," '{ print substr($2,1,5) }' items.txt
HD Ca
Refri
MP3 P
Tenni
Laser

调用shell函数

双向管道 |&

awk 可以使用”|&”和外部进程通信，这个过程是双向的。

实例1：

[root@localhost ~]# cat doub.awk 
BEGIN {
    command = "sed 's/Awk/Sed and Awk/'"
    print "Awk is Great!" |& command
    close(command,"to");  #awk中同时只能存在一个管道
    command |& getline tmp
    print tmp;
    close(command);
}
[root@localhost ~]# awk -f doub.awk 
Sed and Awk is Great!

说明：”|&”表示这里是双向管道。 ”|&”右边命令的输入来自左边命令的输出。close(command,"to") – 一旦命令执行完成，应该关闭”to”进程。 command |& getline tmp –既然命令已经执行完成，就要用 getline 获取其输出。前面命令的输出会被存在变量”tmp”中。close(command) 最后，关闭命令。

system系统函数

执行系统命令时，可以传递任意的字符串作为命令的参数，它会被当做操作系统命令准确第执行，并返回结果(这和双向管道有所不同)。

实例1：

[root@localhost ~]# awk 'BEGIN{system("hostname");}' #不用加print命令
localhost.localdomain  
[root@localhost ~]# awk 'BEGIN{system("pwd")}'
/root
[root@localhost ~]# awk 'BEGIN{system("date")}'
Fri Jan 20 23:57:55 CST 2017

getline函数

geline 命令可以控制 awk 从输入文件(或其他文件)读取数据。注意，一旦 getline执行完成， awk 脚本会重置 NF,NR,FNR 和$0 等内置变量。

实例1：

[root@localhost ~]# cat items.txt 
101,HD Camcorder,Video,210,10
102,Refrigerator,Appliance,850,2
103,MP3 Player,Audio,270,15
104,Tennis Racket,Sports,190,20
105,Laser Printer,Office,475,5
[root@localhost ~]# awk -F"," '
>{getline;print $0;}' items.txt #类似sed中n命令改变awk执行流程
102,Refrigerator,Appliance,850,2
104,Tennis Racket,Sports,190,20
105,Laser Printer,Office,475,5

开始执行 body 区域时，执行任何命令之前， awk 从 items.txt 文件中读取第一行数据，保存在变量$0 中
getline – 我们用 getline 命令强制 awk 读取下一行数据，保存在变量$0 中(之前的内容被覆盖掉了)
print $0 –既然现在$0 中保存的是第二行数据， print $0 会打印文件第二行(而不是第一行)
body 区域继续执行，只打印偶数行的数据。 (注意到最后一行 105 也打印了 )

除了把 getline 的内容放到$0 中，还可以把它保存在变量中。

实例2：打印奇数行

[root@localhost ~]# awk -F"," '{getline tmp; print $0;}' items.txt
101,HD Camcorder,Video,210,10
103,MP3 Player,Audio,270,15
105,Laser Printer,Office,475,5

说明：

开始执行 body 区域时，执行任何命令之前， awk 从 items.txt 文件中读取第一行数据，保存在变量$0 中
getline tmp – 强制 awk 读取下一行，并保存在变量 tmp 中
print $0 – 此时$0 仍然是第一行数据，因为 getline tmp 没有覆盖$0,因此会打印第一行数据(而不是第二行)
body 区域继续执行，只打印奇数行的数据。

实例3：从其他的文件 getline 内容到变量中

[root@localhost ~]# cat items.txt 
101,HD Camcorder,Video,210,10
102,Refrigerator,Appliance,850,2
103,MP3 Player,Audio,270,15
104,Tennis Racket,Sports,190,20
105,Laser Printer,Office,475,5
[root@localhost ~]# cat items-sold.txt 
101 2 10 5 8 10 12
102 0 1 4 3 0 2
103 10 6 11 20 5 13
104 2 3 4 0 6 5
105 10 2 5 7 12 6
[root@localhost ~]# awk -F"," '{
>print $0; 
>getline tmp < "items-sold.txt";
>print tmp;}' items.txt
101,HD Camcorder,Video,210,10
101 2 10 5 8 10 12
102,Refrigerator,Appliance,850,2
102 0 1 4 3 0 2
103,MP3 Player,Audio,270,15
103 10 6 11 20 5 13
104,Tennis Racket,Sports,190,20
104 2 3 4 0 6 5
105,Laser Printer,Office,475,5
105 10 2 5 7 12 6

实例4：getline 执行外部命令

[root@localhost ~]# cat get.awk 
BEGIN {
    FS=",";
    "date" | getline
    close("date")
    print "Timestamp:" $0
} {
if ( $5 <= 5)
    print "Buy More:Order",$2,"immediately!"
else
    print "Sell More:Give discount on",$2,"immediatelty!"
}
[root@localhost ~]# cat items.txt 
101,HD Camcorder,Video,210,10
102,Refrigerator,Appliance,850,2
103,MP3 Player,Audio,270,15
104,Tennis Racket,Sports,190,20
105,Laser Printer,Office,475,5
[root@localhost ~]# awk -f get.awk items.txt 
Timestamp:Sat Jan 21 00:23:53 CST 2017
Sell More:Give discount on HD Camcorder immediatelty!
Buy More:Order Refrigerator immediately!
Sell More:Give discount on MP3 Player immediatelty!
Sell More:Give discount on Tennis Racket immediatelty!
Buy More:Order Laser Printer immediately!

实例5：除了把命令输出保存在$0 中之外，也可以把它保存在任意的 awk 变量中

[root@localhost ~]# cat get2.awk              
BEGIN {FS=",";
    "date" | getline timestamp
    close("date")
    print "Timestamp:" timestamp
} {
if ( $5 <= 5)
    print "Buy More: Order",$2,"immediately!"
else
    print "Sell More: Give discount on",$2,"immediately!"
}
[root@localhost ~]# awk -f get2.awk items.txt 
Timestamp:Sat Jan 21 00:26:29 CST 2017
Sell More: Give discount on HD Camcorder immediately!
Buy More: Order Refrigerator immediately!
Sell More: Give discount on MP3 Player immediately!
Sell More: Give discount on Tennis Racket immediately!
Buy More: Order Laser Printer immediately!

awk自定义函数

格式：

function name ( parameter, parameter, ... ) {
statements
return expression
}

实例1：

[root@localhost ~]# cat fun.awk 
function max(v1,v2) {
    v1>v2?var=v1:var=v2
    return var
}
BEGIN{a=3;b=2;print max(a,b)}
[root@localhost ~]# awk -f fun.awk 
3

转载于:https://blog.51cto.com/yolynn/1893508

weixin_33953384

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Linux高级文本处理之gawk printf命令与函数（六）

一、使用printf格式化输出printf 可以非常灵活、简单地以你期望的格式输出结果。语法:printf"printformat",variable1,variable2,etc.printf 中的特殊字符：printf 不会使用 OFS 和 ORS，它只根据”format”里面的格式打印数据。printf 格式化字符：实例1：[root@localhost~]#...
复制链接

扫一扫