观点:学习编写脚本的最好方法,是从编写开始!当编写脚本到一定程度时,再来阅读脚本的规则(awk手册)。
一,变量与系统常量:
初级问题1:统计每一种码率对应的平均压缩倍数
文件bitRate内容如下:
resolution=1280,bitRate=1000000,YUV/H264multiple=202)
resolution=1280,bitRate=1000000,YUV/H264multiple=232)
resolution=1280,bitRate=1000000,YUV/H264multiple=173)
resolution=1280,bitRate=2000000,YUV/H264multiple=156)
resolution=1280,bitRate=2000000,YUV/H264multiple=146)
resolution=1280,bitRate=2000000,YUV/H264multiple=153)
resolution=1280,bitRate=3000000,YUV/H264multiple=153)
resolution=1280,bitRate=3000000,YUV/H264multiple=154)
resolution=1280,bitRate=4000000,YUV/H264multiple=72)
resolution=1280,bitRate=4000000,YUV/H264multiple=73)
resolution=1280,bitRate=4000000,YUV/H264multiple=75)
......
编写shell脚本:
#! /bin/bash
#
#caution,this awk program may contain bug !
#if the input file have every different kind of recored ...
awk '
BEGIN {FS="\(|\)|,|="}
#NF == 7 {print $4 , $6}
NR==1 { iBR=$4
sum+=$6 # awk expression devided with return or semicolon
++times
privBR=$4
next }
$4==iBR { sum+=$6
++times
privBR=$4 }
$4!=iBR {
iBR=$4
print "bitRate" , privBR , "rlt:" , sum/times
times=1
sum=$6
privBR=$4 }
END {print "bitRate" , privBR , "rlt:" , sum/times }
' $* # | awk '{print $2,$1,$3,$4}' | sort
#file like form: "resolution=1280,bitRate=2000000,YUV/H264multiple=144)"
二,使用awk语句和awk内置函数
初级问题2:我想用自己所能想到的字符串,快速测试自己写的正则表达式。
运行脚本 $./CheckPattern.sh mypattern
> Xxx
Ok!
> xxdd
Wrong!
编写shell脚本CheckPattern1.sh文件:
#!/bin/sh
#
currentTTY=$(tty)
echo $currentTTY
#get pattern from the cmd parameter
testTP=$* #注意$*是shell 命令行的系统变量(父进程,相当于全局变量),而awk
echo $testTP #中的$0,$1$2,$3....则是awk的系统变量(子进程,相当于局部变量)
#从shell中传变量到awk中还好,反过来似乎比较麻烦
awk -v TP="$testTP" -v CTTY=$currentTTY 'BEGIN {
Print "the pattern is:" , TP;
while(1 > 0){
getline str < CTTY;
print "input str is :" , str;
if(str ~ TP ) {print "pattern:",TP,"str:",str,"rlt:","match!"}
else {print "pattern:",TP,"str:",str,"rlt:","mismatching!"}
}
}'
问题是,如何把shell命令行pattern参数传递到BEGIN模块内,这里解决的办法是通过awk的 -v 选项,指定TP和CTTP参数在BEGIN执行之前解析。$testTP用双引号是因为,若pattern有空格,则 -v 后面只是简单的字符串展开,导致pattern以空格分隔的第二个域,被当做输入文件尝试打开。问题是,干嘛要在shell命令行上传入pattern啊?这样使用起来不能实时改变pattern,只能退出程序然后再次启动,重新在命令行输入pattern。改进版本如下:
初级问题3:随便输入一个正则表达式pattern,用自己所能想到的字符串,快速测试自己写的正则表达式。
编写shell脚本CheckPattern2.sh文件:
#!/bin/sh
#check pattern program
echo "welcome to use the check pattern program v.0.0.1"
echo "caution:the pattern only awk's verion, grep or sed maybe mismatch!"
echo "usage $./checkpattern2.sh"
currentTTY=$(tty)
echo $currentTTY
awk -v CTTY=$currentTTY 'BEGIN {
#print "the pattern is:","\""TP"\"";
while(1 > 0){
print "\n";
print "current pattern is:","\""TP"\"";
print "input \"K\" to check pattern; \"G\" to change pattern; \"E\" to exit >> ";
getline str < CTTY;
if(str=="K"){
print "please input string...";
getline str < CTTY;
print "string is :","\""str"\"";
if(str ~ TP ) {print "pattern:","\""TP"\"" "\n" "rlt:","match!"}
else {print "pattern:","\""TP"\"" "\n" "rlt:","mismatching!"}
continue;
}
if(str=="G"){
print "please input pattern...";
getline TP < CTTY;
print "pattern is :","\""TP"\"";
continue;
}
if(str=="E"){
break;
}else{
print "cmd input error!"
}
}
}'
运行的效果:
current pattern is: "^hello{2,}"
input "K" to check pattern; "G" to change pattern; "E" to exit >>
K
please input string...
hello
string is : "hello"
pattern: "^hello{2,}"
rlt: mismatching!
current pattern is: "^hello{2,}"
input "K" to check pattern; "G" to change pattern; "E" to exit >>
K
please input string...
hellooo
string is : "hellooo"
pattern: "^hello{2,}"
rlt: match!
酷不酷?
总是输入K很不方便,改下程序:
#!/bin/sh
#check pattern program
echo "welcome to use the check pattern program v.0.0.1"
echo "caution:the pattern only awk's verion, grep or sed maybe mismatch!"
echo "usage $./checkpattern.sh"
currentTTY=$(tty)
echo $currentTTY
awk -v CTTY=$currentTTY 'BEGIN {
#print "the pattern is:","\""TP"\"";
while(1 > 0){
print "\n";
print "current pattern is:","\""TP"\"";
print "input string to check pattern; \"G\" to change pattern; \"E\" to exit >> ";
getline str < CTTY;
if(str=="G"){
print "please input pattern...";
getline TP < CTTY;
print "pattern is :","\""TP"\"";
continue;
}
if(str=="E"){
break;
}
#check the string is match the pattern
print "string is :","\""str"\"";
if(str ~ TP ) {print "pattern:","\""TP"\"" "\n" "rlt:","match!";}
else {print "pattern:","\""TP"\"" "\n" "rlt:","mismatching!";}
continue;
}
}'
运行效果:
sfjiang@sf-vm:~/Desktop/AwkTest$ ./checkPattern2.sh
welcome to use the check pattern program v.0.0.1
caution:the pattern only awk's verion, grep or sed maybe mismatch!
usage $./checkpattern.sh
/dev/pts/0
current pattern is: ""
input string to check pattern; "G" to change pattern; "E" to exit >>
me
string is : "me"
pattern: ""
rlt: match!
current pattern is: ""
input string to check pattern; "G" to change pattern; "E" to exit >>
G
please input pattern...
^me(.*)it$
pattern is : "^me(.*)it$"
current pattern is: "^me(.*)it$"
input string to check pattern; "G" to change pattern; "E" to exit >>
meiiit
string is : "meiiit"
pattern: "^me(.*)it$"
rlt: match!
current pattern is: "^me(.*)it$"
input string to check pattern; "G" to change pattern; "E" to exit >>
mit
string is : "mit"
pattern: "^me(.*)it$"
rlt: mismatching!
current pattern is: "^me(.*)it$"
input string to check pattern; "G" to change pattern; "E" to exit >>
三,使用数组
初级问题4:统计一下码率对应的平均压缩率
文件bitRate内容如下:
resolution=1280,bitRate=2000000,YUV/H264multiple=156)
resolution=1280,bitRate=1000000,YUV/H264multiple=202)
resolution=1280,bitRate=1000000,YUV/H264multiple=232)
resolution=1280,bitRate=3000000,YUV/H264multiple=154)
resolution=1280,bitRate=1000000,YUV/H264multiple=173)
resolution=1280,bitRate=2000000,YUV/H264multiple=146)
resolution=1280,bitRate=2000000,YUV/H264multiple=153)
resolution=1280,bitRate=3000000,YUV/H264multiple=153)
resolution=1280,bitRate=4000000,YUV/H264multiple=72)
resolution=1280,bitRate=4000000,YUV/H264multiple=73)
resolution=1280,bitRate=5000000,YUV/H264multiple=55)
......
在初级问题1中试图解决这个问题,但是所写的awk程序还不够健壮,并且程序条理比较紊乱,通过awk数组特性,将使得本问题解决的清晰自然!
#! /bin/bash
###
awk '
BEGIN { FS="(|)|,|=" }
{
++times[$4];
sum[$4]+=$6;
}
END {for( i in avg) #special for loop !
{
print"BitRate:",i,", average compress:",sum[i]/times[i];
}
}
' $*
这里的awk代码比初级问题1的代码清晰得多,主要用到了awk的数组特性,以及awk的for语句对数组下标的遍历。以前阅读c++ primer的序言时,里面提到由于语言的限制,需要通过很多额外的技术技巧来绕过语言的缺陷,看来确实是如此。如果没有for语句对数组下标值的“提取”,那么可能就需要两个数组来关联值超大的“下标”和数组值的对应,遍历也就更加困难。
写了这几个awk代码,现在可以粗浅的认知下它的编程风格。感觉awk比C代码更加自由,比如awk的表达式分隔符,既可以通过换行来分隔,也可以通过分号来分隔。最后一个表达式(一般后接大括号 } )可以不带分隔符。而C语言的表达式之间一定要且只能用分号分隔。
Awk的数组功能比C语言强大的多,C是以数值类型、字符类型、结构体等为基本的操作对象。而Awk屏蔽了字符类型细节,直接以字符串(域)为操作对象。数组作为awk的一大特性,其下标(关键字)既可以是数字也可以是字符串。数组跟变量一样,都是使用时自动创建;并且,它们的作用域具有全局性(从开始使用起,直到awk退出才结束)。
Awk的自由强大需要一些规范,比如表达式后面向C语言样加分号。操作变量和数组时,把操作的集合用大括号括起来,以此来表达一个动作整体。比如,若上面蓝色代码部分,若去除大括号,则awk会把每行的输入打印出来,因为++和+=操作游离在过程(动作)之外。