深入理解awk
相关文章
简介
man awk
awk
NAME
awk - pattern-directed scanning and processing language
SYNOPSIS
awk [ -F fs ] [ -v var=value ] [ 'prog' | -f progfile ] [ file ... ]
DESCRIPTION
Awk scans each input file for lines that match any of a set of patterns specified literally in prog or in one or more
files specified as -f progfile. With each pattern there can be an associated action that will be performed when a line
of a file matches the pattern. Each line is matched against the pattern portion of every pattern-action statement; the
associated action is performed for each matched pattern.
- awk 最早由
Alfred V. Aho
,Peter J. Weinberger
, andBrian W. Kernighan
,最早写于1977年。 - awk is a programming language.
Benchmarks
bbs_list
each row is
a computer bulletin board
phone number
baud rate
a code
.
aardvark 555-5553 1200/300 B
alpo-net 555-3412 2400/1200/300 A
barfly 555-7685 1200/300 A
bites 555-1675 2400/1200/300 A
camelot 555-0542 300 C
core 555-2912 1200/300 C
fooey 555-1234 2400/1200/300 B
foot 555-6699 1200/300 B
macfoo 555-6480 1200/300 A
sdace 555-3430 2400/1200/300 A
sabafoo 555-2127 1200/300 C
inventory-shipped
each row is
year
green crates shipped
red boxes shipped
orange bags shipped
blue packages shipped
Jan 13 25 15 115
Feb 15 32 24 226
Mar 15 24 34 228
Apr 31 52 63 420
May 16 34 29 208
Jun 31 42 75 492
Jul 24 34 67 436
Aug 15 34 47 316
Sep 13 55 37 277
Oct 29 54 68 525
Nov 20 87 82 577
Dec 17 35 61 401
Jan 21 36 64 620
Feb 26 58 80 652
Mar 24 75 70 495
Apr 21 70 74 514
e1. find lines contain foo
MacBook-Pro-3:benchmarks sunquan$ awk '/foo/{print $0}' bbs_list
fooey 555-1234 2400/1200/300 B
foot 555-6699 1200/300 B
macfoo 555-6480 1200/300 A
sabafoo 555-2127 1200/300 C
/foo/
:pattern{print $0}
:action
e2. find lines contain 12 or 21 str
MacBook-Pro-3:benchmarks sunquan$ awk '/12/{print $0}/21/{print $0}' bbs_list inventory-shipped
aardvark 555-5553 1200/300 B
alpo-net 555-3412 2400/1200/300 A
barfly 555-7685 1200/300 A
bites 555-1675 2400/1200/300 A
core 555-2912 1200/300 C
fooey 555-1234 2400/1200/300 B
foot 555-6699 1200/300 B
macfoo 555-6480 1200/300 A
sdace 555-3430 2400/1200/300 A
sabafoo 555-2127 1200/300 C
sabafoo 555-2127 1200/300 C
Jan 21 36 64 620
Apr 21 70 74 514
- if a line contains both strings, it is printed twice,once by each rule.
e3. 统计当前目录下7月份更改的文件size总和
MacBook-Pro-3:benchmarks sunquan$ ls -l
total 16
-rw-r--r-- 1 sunquan staff 484 7 13 11:05 bbs_list
-rw-r--r-- 1 sunquan staff 320 7 13 11:22 inventory-shipped
MacBook-Pro-3:benchmarks sunquan$ ls -l | awk '$6 == "7" {sum += $5} END {print sum}'
804
MacBook-Pro-3:benchmarks sunquan$
Program
You write an awk program that consists of a series of rules to tell awk what to do
awk
接受文件,按行检索是否匹配程序中的规则。- 如果某行匹配某个规则,则执行该规则后的action,直到最后一行
Program format
-
awk ‘program’ input-file1 input-file2
-
awk -f program-file input-file1 input-file2
-
program like pattern { action } pattern { action } …
Record
awk会把输入文件的内容划分为一个个record
。通过内置的RS
变量,默认值为\n
awk '{ print $0 } RS="/" bbs_list
or
awk ’BEGIN { RS = "/" } ; { print $0 }’ bbs_list
Field
record
会自动被解析为多个field
,你可以通过$
来引用field
。
This seems like a pretty nice example
$1
:This
e4. 计算
awk '{ total = ($5 + $4 + $3 + $2) ; print total }' inventory-shipped
内置变量说明
术语 | 解释 | 用法 |
---|---|---|
RS | record分隔符 | awk '{ print $0 } RS="/" bbs_list |
$NF | 最后一个field | awk ‘$1 ~ /foo/ {print 1 , 1, 1,NF}’ bbs_list |
$0 | 当前record | awk ‘{print $0}’ bbs_list |
NF | the number of fields | awk ‘END {print "the num of fields is " NF}’ bbs_list |
NR | line num | awk ‘{print NR , $0}’ bbs_list |
FS | field分隔符 | awk ‘BEGIN {FS = “/”} ; {print $1}’ bbs_list |
使用-F-指定FS | awk -F- ‘{print $1}’ bbs_list | |
OFS | 输出Field分隔符 | |
ORS | 输出Record分隔符 | awk ‘BEGIN { OFS = “;” ; ORS = “\n\n” } ; {print $1, $2}’ bbs_list |
> | 输出到文件中 | awk ‘BEGIN {OFS = “;”} ; {print $1, $2 > “result”}’ bbs_list |
Patterns
术语 | 解释 | awk用法 |
---|---|---|
exp ~ /regexp/ | record分隔符 | awk ‘$3 ~ /200/ {print $1, $3}’ bbs_list |
exp !~ /regexp/ | 最后一个field | awk ‘$3 !~ /200/ {print $1, $3}’ bbs_list |
< <= > >= == != ~ !~ | Comparison Expressions | awk ‘$1 ~ /foo/ {print $1}’ bbs_list |
&& || ! | Boolean Operators | awk ‘$1 ~ /foo/ && $3 ~ 300 {print $1, $3}’ bbs_list |
BEGIN | 开头 | |
END | 结尾 | |
Empty Pattern | 匹配任何一行 | awk ’{ print $1 }’ BBS-list |
Action
Expressions as Action
术语 | 解释 | awk用法 |
---|---|---|
\\ \a \b \f \n \r \t | Constant Expressions | awk ‘BEGIN {print “hello \n world”}’ bbs_list |
var=text | Variable | awk ‘{print $var}’ var=1 bbs_list |
x opt y | Opt | awk ‘{ total = $2 + $3 / $4 ; print total}’ inventory-shipped |
fun(args) | invoke funs | awk ‘BEGIN {result = rand() ; print result}’ bbs_list |
Control Statement in Action
术语 | 解释 | awk用法 |
---|---|---|
if (cod) then-body else else-body | If else | awk ‘{if ($2 % 2 == 0) print $2 “is evel” ; else print $2 “is odd”}’ inventory-shipped |
while (cod) body | while | #! /usr/bin/awk -f { i = 1 while (i <= 3){ print $i i++ } } |
for (initialization; condition; increment) body | for | awk ‘{ for (i=1; i<=3; i++) print $i }’ bbs_list |
break continue | ||
next | stop cur record, go to next record. |
Array in Awk
Arr in awk looks like map。
array [index ]
to get a valuearray[subscript] = value
to put a value- you don’t need to init size for arr
#! /usr/bin/awk -f
{
if ($1 > max)
max = $1
arr[$1] = $0
}
END{
for (x = 1; x <= max; x++)
print arr[x]
}
## input data
5 IamtheFiveman
2 Who are you? The new number two! 4 ...Andfouronthefloor
1 Who is number one?
3 I three you.
## output result
1 Who is number one?
2 Who are you? The new number two! 3 I three you.
4 ...Andfouronthefloor
5 IamtheFiveman
e5. awk scripts:hello world
MacBook-Pro-3:shell sunquan$ which awk
/usr/bin/awk
# demo: awkHelloWorld
#! /usr/bin/awk -f
BEGIN {print "hello, world"}
MacBook-Pro-3:shell sunquan$ awkHelloWorld
"hello, world"
Func in Awk
#! /usr/bin/awk -f
#file name is awkHelloWorld
function myFunc (win) {
print "the value is " , win
}
function doBefore (){
print "hello world"
}
function doLast (){
print "the num of lines is ", NR
}
BEGIN {doBefore()} ; {myFunc($1)} ; END {doLast()}
# result
awkHelloWorld benchmarks/bbs_list
hello world
the value is aardvark
the value is alpo-net
the value is barfly
the value is bites
the value is camelot
the value is core
the value is fooey
the value is foot
the value is macfoo
the value is sdace
the value is sabafoo
the num of lines is 11
个人简介
工作:Senior Engineer
Alibaba
email:sunquan9301@163.com
WX:sunquan97
HomePage:qsun97.com