细说awk-2

最新推荐文章于 2024-09-14 10:59:12 发布

sf_jiang

最新推荐文章于 2024-09-14 10:59:12 发布

阅读量249

点赞数

文章标签： awk 文本数据提取 awk语言 shell脚本

本文链接：https://blog.csdn.net/sf_jiang/article/details/78875750

版权

观点：学习编写脚本的最好方法，是从编写开始！当编写脚本到一定程度时，再来阅读脚本的规则（awk手册）。

一，变量与系统常量：

初级问题1：统计每一种码率对应的平均压缩倍数

文件bitRate内容如下：

resolution=1280,bitRate=1000000,YUV/H264multiple=202)

resolution=1280,bitRate=1000000,YUV/H264multiple=232)

resolution=1280,bitRate=1000000,YUV/H264multiple=173)

resolution=1280,bitRate=2000000,YUV/H264multiple=156)

resolution=1280,bitRate=2000000,YUV/H264multiple=146)

resolution=1280,bitRate=2000000,YUV/H264multiple=153)

resolution=1280,bitRate=3000000,YUV/H264multiple=153)

resolution=1280,bitRate=3000000,YUV/H264multiple=154)

resolution=1280,bitRate=4000000,YUV/H264multiple=72)

resolution=1280,bitRate=4000000,YUV/H264multiple=73)

resolution=1280,bitRate=4000000,YUV/H264multiple=75)

......

编写shell脚本：

#! /bin/bash

#caution,this awk program may contain bug !

#if the input file have every different kind of recored ...

awk '

BEGIN {FS="$|$|,|="}

#NF == 7 {print $4 , $6}

NR==1 { iBR=$4

sum+=$6 # awk expression devided with return or semicolon

++times

privBR=$4

next }

$4==iBR { sum+=$6

++times

privBR=$4 }

$4!=iBR {

iBR=$4

print "bitRate" , privBR , "rlt:" , sum/times

times=1

sum=$6

privBR=$4 }

END {print "bitRate" , privBR , "rlt:" , sum/times }

' $* # | awk '{print $2,$1,$3,$4}' | sort

#file like form: "resolution=1280,bitRate=2000000,YUV/H264multiple=144)"

二，使用awk语句和awk内置函数

初级问题2：我想用自己所能想到的字符串，快速测试自己写的正则表达式。

运行脚本 $./CheckPattern.sh mypattern

> Xxx

Ok!

> xxdd

Wrong!

编写shell脚本CheckPattern1.sh文件：

#!/bin/sh

currentTTY=$(tty)

echo $currentTTY

#get pattern from the cmd parameter

testTP=$* #注意$*是shell 命令行的系统变量（父进程，相当于全局变量），而awk

echo $testTP #中的$0,$1$2,$3....则是awk的系统变量（子进程，相当于局部变量）

#从shell中传变量到awk中还好，反过来似乎比较麻烦

awk -v TP="$testTP" -v CTTY=$currentTTY 'BEGIN {

Print "the pattern is:" , TP;

while(1 > 0){

getline str < CTTY;

print "input str is :" , str;

if(str ~ TP ) {print "pattern:",TP,"str:",str,"rlt:","match!"}

else {print "pattern:",TP,"str:",str,"rlt:","mismatching!"}

}

问题是，如何把shell命令行pattern参数传递到BEGIN模块内，这里解决的办法是通过awk的 -v 选项，指定TP和CTTP参数在BEGIN执行之前解析。$testTP用双引号是因为，若pattern有空格，则 -v 后面只是简单的字符串展开，导致pattern以空格分隔的第二个域，被当做输入文件尝试打开。问题是，干嘛要在shell命令行上传入pattern啊？这样使用起来不能实时改变pattern，只能退出程序然后再次启动，重新在命令行输入pattern。改进版本如下：

初级问题3：随便输入一个正则表达式pattern，用自己所能想到的字符串，快速测试自己写的正则表达式。

编写shell脚本CheckPattern2.sh文件：

#!/bin/sh

#check pattern program

echo "welcome to use the check pattern program v.0.0.1"

echo "caution:the pattern only awk's verion, grep or sed maybe mismatch!"

echo "usage $./checkpattern2.sh"

currentTTY=$(tty)

echo $currentTTY

awk -v CTTY=$currentTTY 'BEGIN {

#print "the pattern is:","\""TP"\"";

while(1 > 0){

print "\n";

print "current pattern is:","\""TP"\"";

print "input \"K\" to check pattern; \"G\" to change pattern; \"E\" to exit >> ";

getline str < CTTY;

if(str=="K"){

print "please input string...";

getline str < CTTY;

print "string is :","\""str"\"";

if(str ~ TP ) {print "pattern:","\""TP"\"" "\n" "rlt:","match!"}

else {print "pattern:","\""TP"\"" "\n" "rlt:","mismatching!"}

continue;

}

if(str=="G"){

print "please input pattern...";

getline TP < CTTY;

print "pattern is :","\""TP"\"";

continue;

}

if(str=="E"){

break;

}else{

print "cmd input error!"

}

运行的效果：

current pattern is: "^hello{2,}"

input "K" to check pattern; "G" to change pattern; "E" to exit >>

please input string...

hello

string is : "hello"

pattern: "^hello{2,}"

rlt: mismatching!

current pattern is: "^hello{2,}"

input "K" to check pattern; "G" to change pattern; "E" to exit >>

please input string...

hellooo

string is : "hellooo"

pattern: "^hello{2,}"

rlt: match!

酷不酷？

总是输入K很不方便，改下程序：
#!/bin/sh

#check pattern program

echo "welcome to use the check pattern program v.0.0.1"

echo "caution:the pattern only awk's verion, grep or sed maybe mismatch!"

echo "usage $./checkpattern.sh"

currentTTY=$(tty)

echo $currentTTY

awk -v CTTY=$currentTTY 'BEGIN {

#print "the pattern is:","\""TP"\"";

while(1 > 0){

print "\n";

print "current pattern is:","\""TP"\"";

print "input string to check pattern; \"G\" to change pattern; \"E\" to exit >> ";

getline str < CTTY;

if(str=="G"){

print "please input pattern...";

getline TP < CTTY;

print "pattern is :","\""TP"\"";

continue;

}

if(str=="E"){

break;

}

#check the string is match the pattern

print "string is :","\""str"\"";

if(str ~ TP ) {print "pattern:","\""TP"\"" "\n" "rlt:","match!";}

else {print "pattern:","\""TP"\"" "\n" "rlt:","mismatching!";}

continue;

}

运行效果：

sfjiang@sf-vm:~/Desktop/AwkTest$ ./checkPattern2.sh

welcome to use the check pattern program v.0.0.1

caution:the pattern only awk's verion, grep or sed maybe mismatch!

usage $./checkpattern.sh

/dev/pts/0

current pattern is: ""