The awk programming&Effective awk 笔记

The awk programming


awk '$3 > 0 { print $1, $2 * $3 }' emp.data  命令参数用逗号隔开

pattern {action}

 

只有pattern 如:$3 == 0 ,默认actionprint 打印出匹配行

只有action 如:{ print $1 } ,打印出所有行的第一个域

 

运行awk程序:

awk 'program' input files

awk '$3 == 0 { print $1 }' file1 file2  多个文件

awk 'program '  缺少文件就从命令行读取

运行指定awk命令脚本

awk -f progfile inputfile

 

Awk只有2种类型数据。字符串和数字

{print} 相当于 {print $0}

每读取一行记录,$0就是当前读取记录的内容

 

打印行号:

{print NR}NR代表读的第几条记录

Printf(format,value1,value2,...,valueN); 要换行加\n

Print 会自动换行

 

特殊模式BEGIN在第一个输入文件的第一行之前匹配
被读取,并且在最后一个文件的最后一行被处理后END匹配。

也就是说再进行正常pattern 匹配之前处理BEGIN

 

print n, ..employees, total pay is 11 , pay,
"average pay is 11 , pay/n

进行长的声明,可以用逗号打破几行,之前有说过用 反斜杠进行换行分隔

{ print $NF }  打印每行最后的字段

 

{ temp = $1; $1 ~ $2; $2 = temp; print } 交换数据

 

{ Print \

     $1,

     $2,

     $3 }

 

 

Summary of Patterns

BEGIN { statements }

The statements are executed once before any input has been read.

2. END { statements }
The statements are executed once after all input has been read.
3. expression { statements }
The statements are executed 'lt each input line where the expression is true, that is,
nonzero or nonnull.
4. /regular expression I { statements }
The statements are executed at each input line that contains a string matched by the
regular expression.
5. compound pattern { statements }
A compound pattern combines expressions with && (AND), II (OR), I (NOT), and
parentheses; the statements are executed at each input line where the compound
pattern is true.
6. pattern 1 , pattern 2 { statements }
A range pattern matches each input line from a line matched by pattern 1 to the next
line matched by pattern 2, inclusive; the statements are executed at each matching
line.
BEGIN and END do not combine with other patterns. A range pattern cannot be part of
any other pattern. BEGIN and END are the only patterns that require an action.

 

FS默认是 \ 或者tab

 

String-Matching Patterns
1. /regexpr /
Matches when the current input line contains a substring matched by regexpr.
2. expression - /regexpr /
Matches if the string value of expression contains a substring matched by regexpr.
3. expression I - /regexpr /
Matches if the string value of expression does not contain a substring matched by
regexpr.
Any expression may be used in place of /regexpr/ in the context of ~ and ~.

~ /Asia/ ~ /Asia/ $0 ~ /Asia/

 

Regular Expressions
1. The regular expression metacharacters are:
\"$.[]1()*+?
2. A basic regular expression is one of the following:
a nonmetacharacter, such as A. that matches itself.
an escape sequence that matches a special symbol: 't matches a tab (see Table 2-2).
a quoted metacharacter, such as '*·that matches the metacharacter literally.
". which matches the beginning of a string.
$, which matches the end of a string.
. which matches any single character.
a character class: [ABC 1 matches any of the characters A. B, or C.
character classes may include abbreviations: [A-Za-z 1 matches any single letter.
a complemented character class: [ "0-9 1 matches any character except a digit.
3. These operators combine regular expressions into larger ones:
alternation: A: B matches A or B.
concatenation: AB matches A immediately followed by B.
closure: A* matches zero or more A's.
positive closure: A+ matches one or more A's.
zero or one: A? matches the null string or A.
parentheses: (r) matches the same strings as r does.

 

In a regular expression, an unquoted caret " matches the beginning of a
string, an unquoted dollar-sign $ matches the end of a string, and an unquoted
period matches any single character

 

^C  matches a C at the beginning of a string
C$  matches a C at the end of a string
^C$ matches the string consisting of the single character c
^.$ matches any string containing exactly one character
^... $ matches any string containing exactly three characters
... matches any three consecutive characters
\.$ matches a period at the end of a string


[ AEIOU] matches any of the characters A. E, I, 0, or u.

[^0-9] matches any character except a digit;

[ ^a-zA-Z] matches any character except an upper or lower-case letter

^[^^] matches any character except a caret at the beginning of a string

^[ABC] matches an A, B or c at the beginning of a string
^[^ABC] matches any character at the beginning of a string. except A, B or C
[^ABC] matches any character other than an A, B Or C
^[^a-z] $ matches any single-character string. except a lower-case letter
B*   matches the null string orB or BB, and so on
AB*C matches AC or ABC or ABBC, and so on
AB*C matches ABC or ABBC or ABBBC, and so on
ABB*C also matches ABC or ABBC or ABBBC, and so on
AB?C matches AC or ABC
[A-Z]+ matches any string of one or more upper-case letters
(AB)+C matches ABC, ABABC, ABABABC, and so on


/^[0-9]+$/
matches any input line that consists of only digits
/^[0-9][0-9](0-9]$/
exactly three digits
/^(\+|-)?[0-9]+\.?[0-9]*$/
a decimal number with an optional sign and optional fraction
/^[+-]?[0-9]+[.]?[0-9]*$/
also a decimal number with an optional sign and optional fraction
/^[+-]?([0-9]+[.]?[0-9]*|[.][0-9]+)([eE][+-]?[0-9]+)?$/
a floating point number with optional sign and optional exponent
/^[A-Za-z][A-Za-z0-9]+$(
a letter followed by any letters or digits (e.g., awk variable name)
/^[A-Za-z]$|^[A-Za-z][0-9]$/
a letter or a letter followed by a digit (e.g., variable name in Basic)
/^[A-Za-z][0-9]?$/
also a letter or a letter followed by a digit

/Asia/ || /Europe/  可以写成 /Asia | Europe/

 

范围匹配:需要两个pattern /aaa/,/bbb/

FNR =:: 1, FNR == 5 { print FILENAME ": " $0 } 等于

FNR <= 5 { print FILENAME ": " SO }

 

PATIERN

EXAMPLE

MATCHES

 

BEGIN
END
expression
string-matching
compound
range

BEGIN
END
$3 < 100

/Asia/


$3 < 100 &.&.
$4 =:: "Asia"


NR==10, NR==20

before any input has been read
after all input has been read
lines in which third field is less than 100
lines that contain Asia
lines in which third field is less than 100 and
fourth field is Asia
tenth to twentieth lines of input inclusive


 

Actions
The statements in actions can include:
expressions, with constants, variables, assignments, function calls, etc.
print expression -list
printf (format, expression -list)
if (expression) statement
if (expression) statement else statement
while (expression) statement
for (expression ; expression ; expression) statement
for (variable in array) statement
do statement while (expression)
break
continue
next
exit
exit expression
{ statements }

 

 

VARIABLE

MEANING

DEFAULT

ARGC
ARGV
FILENAME
FNR
FS
NF
NR
OFMT
OFS
ORS
RLENGTH
RS
RSTART
SUBSEP

number of command-line arguments
array of command-line arguments
name of current input file
record number in current file
controls the input field separator
number of fields in current record
number of records read so far
output format for numbers
output field separator
output record separator
length of string matched by match function
controls the input record separator
start of string matched by match function
subscript separator

 

 

" "

 

 

"%.6g"
II II
11\n"
"\n"
"\034

 

 

Expressions
1. The primary expressions are:
numeric and string constants, variables, fields, function calls, array elements.
2. These operators combine expressions:
assignment operators = += -= *= I= %= "'=
conditional expression operator ? :
logical operators I I (OR), && (AND), I (NOT)
matching operators - and I -
relational operators < <= == I= > >=
concatenation (no explicit operator)
arithmetic operators + - * 1 % "'
unary +and -
increment and decrement operators ++and -- (prefix and postfix)
parentheses for grouping

 

 

for (variable in array)
statement

delete array[subscript]

 

for (i in pop)
delete pop[i]

 

 

The print statement has two forms:
print expr 1 , expr 2 , ... , expr,
print(expr 1 , expr 2 , ••• , expr,)

 

{ print $1, $2 > $3 }  将第一个和第二字段写到第三个字段的文件中

{ print $1, ($2 > $3) } 这里$2$3进行比较

从上面看出如果参数列表中表达式包含关系运算符需要用()括起来。不然就会

当做重定向> 执行

 

输出到管道:

Print | command

printf("%15s\t%6d\n", c, pop[c]) | "sort -t'\t'

 

print message I "cat 1>&2" # redirect cat to stderr

system( "echo '" message "' 1>&2") # redirect echo to stderr

print message > "/dev/tty" # write directly on terminal

 

 

Output Statements

Print

print $0 on standard output

print expression, expression, ...

print expression's, separated by OFS, terminated by ORS

print expression, expression, ... >filename

print on file filename instead of standard output

print expression, expression, ... >>filename
append to file filename instead of overwriting previous contents

print expression, expression, ... I command

print to standard input of command

print£ <format, expression, expression, ... )

print£ <format, expression, expression, ... ) >filename

print£ <format , expression, expression, ... ) >>filename

print£ <format, expression, expression, ... ) | command
print£ statements are like print but the first argument specifies output format

close <filename). close (command)
break connection between print and filename or command

system (command)
execute command; value is status return of command

 

 

awk -F', [ \t]*: [ \t)+' 'program'  命令行上设置的参数相当于BEGIN { FS = ",[ \t]*:[ \t]+}

 

EXPRESSION

SETS

getline
getline var
getline <file
getline var <file
cmd
getline
cmd : getline var

$0, NF, NR, FNR
var, NR, FNR
$0, NF
var
$0,NF
var

 

命令行参数:

awk -f progfile a v=1 b

ARGC4个参数,

ARGV[0] awk

ARGV[1] a

ARGV[0] v=1

ARGV[0] b

如果awk 程序出现在命令行上,那么这样的命令也不是视为参数。

awk -F'\t' 'S3 > 100' countries

ARGC = 2

ARGV[1] 是 countries

生成1-n 的随机数 rand()*n

 

AWK SUMMARY

 

awk [ -Fs] 'program' optional list offilenames

awk [ -Fs] -f progfile optional list offilenames

 

pattern { action }

function name (parameter-list) { statement }

Patterns
BEGIN
END
expression
/regular expression/
pattern && pattern
pattern || pattern
!pattern
(pattern)
pattern , pattern

 

Actions
An action is a sequence of statements of the following kinds:
break
continue
delete array-element
do statement while (expression)
exit [expression]
expression
if (expression) statement [else statement]
input-output statement
for (expression; expression; expression) statement
for (variable in array) statement
next
return [expression]
while (expression) statement
{ statements }

Input-output
close (expr) close file or pipe denoted by expr
getline set $0 from next input record; set NF, NR, FNR
getline <file     set SO from next record of file; set NF
getline var set var from next input record; set NR, FNR
getline var <file   set var from next record of file
print print current record
print expr-list print expressions in expr-list
print expr-list >file print expressions on file
print£ fmt, expr-list format and print
print£ fmt, expr-list >file format and print on file
system (cmd-line) execute command cmd-line, return status




Printf format conversions

These conversions are recognized in print£ and sprint£ statements.


%c ASCII character
%d decimal number
%e [-]d.ddddddE[+-]dd
%£ [-]ddd. dddddd
%g e or f conversion, whichever is shorter,
with nonsignificant zeros suppressed
%o unsigned octal number
%s string
%x unsigned hexadecimal number
%% print a %; no argument is converted
Additional parameters may lie between the %and the control letter:
- left-justify expression in its field
width pad field to this width as needed; leading 0 pads with zeros
.prec maximum string width or digits to right of decimal point

 

Built-in variables
The following built-in variables can be used in any expression:
ARGC number of command-line arguments

ARGV array of command-line arguments (ARGV[ 0 •• ARGC-1 ])
FILENAME name of current input file
FNR input record number in current file
FS input field separator (default blank)
NF number of fields in current input record
NR input record number since beginning
OFMT output format for numbers (default "%. 6g")
OFS     output field separator (default blank)
ORS output record separator (default newline)
RLENGTH length of string matched by regular expression in match
RS input record separator (default newline)
RSTART beginning position of string matched by match
SUBSEP separator for array subscripts of form [i,j, ... ] (default "\034 ")



Limits
Any particular implementation of awk enforces some limits. Here are typical values:
100 fields
3000 characters per input record
3000 characters per output record
1024 characters per field
3000 characters per print£ string
400 characters maximum literal string
400 characters in character class
15 open files
1 pipe
double-precision floating point


Effective awk

pattern {action}

 

awk length($0) > 80data 打印超过80个字符的行

这条命令只有pattern没有action,但是有一个默认的action就是print $0打印记录

 

awk { if (length($0) > max) max = length($0) } END { print max }data

没有pattern 只有action

 

awk NF > 0data  打印每一行至少有一个字段

 

多个规则:举列2个规则

awk /12/ { print $0 }

/21/ { print $0 }BBS-list inventory-shipped

每个规则对应一个行动。这条命令打印出包含12 或者21 的行,如果一行中包含2个规则,那么会被打印2次,因为每个action执行一次。

 

awk /This regular expression is too long, so continue it\
on the next line/ { print $1 }

\ 反斜杠进行换行,如果你想在换行符的位置将一条语句分成两行可以通过用反斜杠\字符结束第一行来继续它,反斜杠必须是行中的最后一个字符才能被识别一个接续字符,

 

awk 'BEGIN{

  print \

   "hello world"

  }'

awk是一种面向行的语言。 每条规则的行为都必须与行相同该模式。 要将模式和操作放在不同的行上,您必须使用反斜杠

延续; 没有其他选择。

 

多个规则如果在同一行上,那么必须要使用分号隔开

awk /12/ { print $0 } ;/21/ { print $0 }BBS-list inventory-shipped

 

awk -f source-file input-file1 input-file2

ls -l  | awk '{ x += $5 } END { print "total bytes: " x }'  

awk -F: '{ print $1 }' /etc/passwd | sort

~ /regexp/ ,!~ /regexp/  匹配与不匹配

 

wh{3}y Matches whhhy, but not whyor whhhhy.
wh{3,5}y Matches whhhy, whhhhy, or whhhhhy, only.
wh{2,}y Matches whhyor whhhy, and so on.

 

$ echo a b c d | awk { OFS = ":"; $2 = ""
> print $0; print NF }
a a::c:d
a 4

 

$ echo a b c d | awk { OFS = ":"; $2 = ""; $6 = "new"
> print $0; print NF }
a a::c:d::new
a 6

 

$ echo a b c d e f | awk { print "NF =", NF;
> NF = 3; print $0 }
a NF = 6
a a b c

 

BEGIN规则

在读取任何输入之前执行,没有输入记录,因此没有当执行BEGIN规则时,对$ 0的引用和felds产生一个空字符串或零

BEGINEND规则中不允许使用nextnextfile语句

 

 

echo -n "Enter search pattern: "
read pattern
awk "/$pattern/ "{ nmatches++ }
END { print nmatches, "found" }/path/to/data

 

第一部分是双引号,允许替换模式引号内的变量

 

switch (expression) {
case value or regular expression:
case-body
default:
default-body

}

Ru

switch (NR * 2 + 1) {
case 3:
case "11":
print NR - 1
break
case /2[[:digit:]]+/:
print NR
default:
print NR + 1
case -1:
print NR * -1
}

 

systime()

mktime(datespec)

strftime([format [, timestamp [, utc-flag]]])

 

and(v1, v2) Returns the bitwise AND of the values provided by
v1 and v2.
or(v1, v2) Returns the bitwise OR of the values provided by v1
and v2.
xor(v1, v2) Returns the bitwise XOR of the values provided by
v1 and v2.

compl(val)

Returns the bitwise complement of val.

lshift(val, count)

Returns the value of val, shifted left by count bits.

rshift(val, count)

Returns the value of val, shifted right by count bits

 


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

HarkerYX

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值