AWK 是一种解释执行的编程语言。它非常的强大,被设计用来专门处理文本数据。AWK 的名称是由它们设计者的名字缩写而来 —— Afred Aho, Peter Weinberger 与 Brian Kernighan。
1. AWK 程序的结构
开始块 主体块 结束块
1.1 开始块(BEGIN block)
开始块的语法格式如下所示:
BEGIN {awk-commands}
顾名思义,开始块就是在程序启动的时候执行的代码部分,并且它在整个过程中只执行一次。 一般情况下,我们在开始块中初始化一些变量。BEGIN 是 AWK 的关键字,因此它必须是大写的。 不过,请注意,开始块部分是可选的,你的程序可以没有开始块部分。
[关键词] :只执行一次 可选的
1.2 主体块(Body Block)
主体部分的语法要求如下:
/pattern/ {awk-commands}
对于每一个输入的行都会执行一次主体部分的命令。默认情况下,对于输入的每一行,AWK 都会很执行命令。但是,我们可以将其限定在指定的模式中。 注意,在主体块部分没有关键字存在。
[关键词] :每行都执行一次 可选的
1.3 结束块(END Block)
下面是结束块的语法格式:
END {awk-commands}
结束块是在程序结束时执行的代码。 END 也是 AWK 的关键字,它也必须大写。 与开始块相似,结束块也是可选的。
[关键词] :只执行一次 可选的
1.4 例子
marks.txt
1) Amit Physics 80
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
[hadoop@ruozedata001 22]$ awk 'BEGIN{printf "start\nSr No\tName\tSub\tMarks\n"} {print} END {print "end"}' marks.txt
start
Sr No Name Sub Marks
1) Amit Physics 80
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
end
2. awk 常用内置变量
2.1 ARGV
这个变量表示存储命令行输入参数的数组。数组的有效索引是从 0 到 ARGC-1。
[hadoop@ruozedata001 22]$ awk 'BEGIN { for (i = 0; i < ARGC; ++i)
{ printf "ARGV[%d] = %s\n", i, ARGV[i] }
}' one two three four
ARGV[0] = awk
ARGV[1] = one
ARGV[2] = two
ARGV[3] = three
ARGV[4] = four
2.2 ENVIRON
此变量是与环境变量相关的关联数组变量。
[hadoop@ruozedata001 22]$ awk 'BEGIN { print ENVIRON["USER"] }'
hadoop
[hadoop@ruozedata001 22]$ awk 'BEGIN { print ENVIRON["JAVA_HOME"] }'
/usr/java/jdk1.8.0_45
[hadoop@ruozedata001 22]$ awk 'BEGIN { print ENVIRON["ZOOKEEPER_HOME"] }'
/home/hadoop/app/zookeeper
2.3 FILENAME
此变量表示当前文件名称。
[hadoop@ruozedata001 22]$ awk 'END {print FILENAME}' marks.txt
marks.txt
2.4 $0
此变量表示整个输入记录。
[hadoop@ruozedata001 22]$ awk '{print $0}' marks.txt
1) Amit Physics 80
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
2.5 $n
此变量表示当前输入记录的第 n 个域,这些域之间由 FS 分割。
[hadoop@ruozedata001 22]$ awk '{print $3 "\t" $4}' marks.txt
Physics 80
Maths 90
Biology 87
English 85
History 89
[hadoop@ruozedata001 22]$ awk '{print $3 "\t" $2}' marks.txt
Physics Amit
Maths Rahul
Biology Shyam
English Kedar
History Hari
[hadoop@ruozedata001 22]$ awk '{print $1 "\t" $2}' marks.txt
1) Amit
2) Rahul
3) Shyam
4) Kedar
5) Hari
2.6 PROCINFO
这是一个关联数组变量,它保存了进程相关的信息。比如, 真正的和有效的 UID 值,进程 ID 值等等。
[hadoop@ruozedata001 22]$ awk 'BEGIN { print PROCINFO["pid"] }'
26488
[hadoop@ruozedata001 22]$ awk 'BEGIN { print PROCINFO["pid"] }'
26506
3. awk 操作符
3.1 运算符: + 、 - 、 * 、 / 、 %
[hadoop@ruozedata001 22]$ awk 'BEGIN { a = 50; b = 20; print "(a + b) = ", (a + b) }'
(a + b) = 70
[hadoop@ruozedata001 22]$ awk 'BEGIN { a = 50; b = 20; print "(a - b) = ", (a - b) }'
(a - b) = 30
[hadoop@ruozedata001 22]$ awk 'BEGIN { a = 50; b = 20; print "(a * b) = ", (a * b) }'
(a * b) = 1000
[hadoop@ruozedata001 22]$ awk 'BEGIN { a = 50; b = 20; print "(a / b) = ", (a / b) }'
(a / b) = 2.5
[hadoop@ruozedata001 22]$ awk 'BEGIN { a = 50; b = 20; print "(a % b) = ", (a % b) }'
(a % b) = 10
3.2 赋值操作符: =、 += 、 -= 、 /= 、 *= 、%=
[hadoop@ruozedata001 22]$ awk 'BEGIN { name = "Jerry"; print "My name is", name }'
My name is Jerry
[hadoop@ruozedata001 22]$ awk 'BEGIN { cnt=10; cnt += 10; print "Counter =", cnt }'
Counter = 20
[hadoop@ruozedata001 22]$ awk 'BEGIN { cnt=100; cnt -= 10; print "Counter =", cnt }'
Counter = 90
[hadoop@ruozedata001 22]$ awk 'BEGIN { cnt=10; cnt *= 10; print "Counter =", cnt }'
Counter = 100
[hadoop@ruozedata001 22]$ awk 'BEGIN { cnt=100; cnt /= 5; print "Counter =", cnt }'
Counter = 20
[hadoop@ruozedata001 22]$ awk 'BEGIN { cnt=100; cnt %= 8; print "Counter =", cnt }'
Counter = 4
3.3 关系运算符: == 、!= 、< 、 <= 、 > 、 >=
[hadoop@ruozedata001 22]$ awk 'BEGIN { a = 10; b = 10; if (a == b) print "a == b" }'
a == b
[hadoop@ruozedata001 22]$ awk 'BEGIN { a = 10; b = 20; if (a != b) print "a != b" }'
a != b
[hadoop@ruozedata001 22]$
[hadoop@ruozedata001 22]$ awk 'BEGIN { a = 10; b = 20; if (a < b) print "a < b" }'
a < b
[hadoop@ruozedata001 22]$
[hadoop@ruozedata001 22]$ awk 'BEGIN { a = 10; b = 10; if (a <= b) print "a <= b" }'
a <= b
[hadoop@ruozedata001 22]$
[hadoop@ruozedata001 22]$ awk 'BEGIN { a = 10; b = 20; if (b > a ) print "b > a" }'
b > a
[hadoop@ruozedata001 22]$
[hadoop@ruozedata001 22]$ awk 'BEGIN { a = 10; b = 10; if (a >= b) print "a >= b" }'
a >= b
3.4 逻辑运算符:&& 、 || 、!
[hadoop@ruozedata001 22]$ awk 'BEGIN {num = 5; if (num >= 0 && num <= 7) printf "%d is in octal format\n", num }'
5 is in octal format
[hadoop@ruozedata001 22]$ awk 'BEGIN {ch = "\n"; if (ch == " " || ch == "\t" || ch == "\n") print "Current character is whitespace." }'
Current character is whitespace.
[hadoop@ruozedata001 22]$ awk 'BEGIN { name = ""; if (! length(name)) print "name is empty string." }'
name is empty string.
3.5 三元运算符
三元运算符语法:condition expression ? statement1 : statement2
[hadoop@ruozedata001 22]$ awk 'BEGIN { a = 10; b = 20; (a > b) ? max = a : max = b; print "Max =", max}'
Max = 20
4.AWK 数组
4.1 一维数组
[hadoop@ruozedata001 22]$ awk 'BEGIN {
> array["0"] = 100;
> array["1"] = 200;
> array["2"] = 300;
> array["3"] = 400;
> array["4"] = 500;
> array["5"] = 600;
> # print array elements
> print "array[0] = " array["0"];
> print "array[1] = " array["1"];
> print "array[2] = " array["2"];
> print "array[3] = " array["3"];
> print "array[4] = " array["4"];
> print "array[5] = " array["5"];
> }'
array[0] = 100
array[1] = 200
array[2] = 300
array[3] = 400
array[4] = 500
array[5] = 600
4.2 二维数组
[hadoop@ruozedata001 22]$ awk 'BEGIN {
> array["0,0"] = 100;
> array["0,1"] = 200;
> array["0,2"] = 300;
> array["1,0"] = 400;
> array["1,1"] = 500;
> array["1,2"] = 600;
> # print array elements
> print "array[0,0] = " array["0,0"];
> print "array[0,1] = " array["0,1"];
> print "array[0,2] = " array["0,2"];
> print "array[1,0] = " array["1,0"];
> print "array[1,1] = " array["1,1"];
> print "array[1,2] = " array["1,2"];
> }'
array[0,0] = 100
array[0,1] = 200
array[0,2] = 300
array[1,0] = 400
array[1,1] = 500
array[1,2] = 600
4.3 删除元素
[hadoop@ruozedata001 22]$ awk 'BEGIN {
> array["0"] = 100;
> array["1"] = 200;
> array["2"] = 300;
> array["3"] = 400;
> array["4"] = 500;
> array["5"] = 600;
> # print array elements
> print "array[0] = " array["0"];
> print "array[1] = " array["1"];
> print "array[2] = " array["2"];
> print "array[3] = " array["3"];
> print "array[4] = " array["4"];
> delete array["5"]
> print "array[5] = " array["5"];
> }'
array[0] = 100
array[1] = 200
array[2] = 300
array[3] = 400
array[4] = 500
array[5] =
5.AWK 控制流
5.IF 语句
条件语句测试条件然后根据条件选择执行相应的动作。下面是条件语句的语法:
if (condition)
{
action-1
action-1
.
.
action-n
}
[hadoop@ruozedata001 22]$ awk 'BEGIN {num = 10; if (num % 2 == 0) printf "%d is even number.\n", num }'
10 is even number.
5.2 IF - ELSE 语句
if-else语句中允许在条件为假时执行另外一组的动作。下面为 if-else 的语法格式:
if (condition)
action-1
else
action-2
[hadoop@ruozedata001 22]$ awk 'BEGIN {num = 11;
> if (num % 2 == 0) printf "%d is even number.\n", num;
> else printf "%d is odd number.\n", num
> }'
11 is odd number.
6.AWK 循环
6.1 for循环
For 循环的语法如下:
for (initialisation; condition; increment/decrement)
action
[hadoop@ruozedata001 22]$ awk 'BEGIN { for (i = 1; i <= 5; ++i) print i }'
1
2
3
4
5