Linux命令详解之mawk

pattern scanning and text processing language


语法:

mawk [-F value] [-v var=value] [--] 'program text' [file...]

mawk [-F value] [-v var=value] [-f program-file] [--] [file...]


描述:

awk是一种文本数据处理语言,而mawk是针对这种语言的一种解释器。

awk程序由pattern{action}序列及函数定义组成。输入的数据根据RS(Record seperator,默认=‘\n’)划分为逐个的记录,每个记录均与pattern进行比较,若匹配,执行相应action。


选项:

-F

设置field seperator,FS,根据该值将record分为field

-f

awk程序文件

-v var=value

设置程序变量

--

指示mawk命令的选项说明结束


下面介绍AWK语言

1,程序结构

前面提到,awk程序由pattern{action}序列或函数定义组成。

其中,pattern可以是

BEGIN

END

expression

expression, expression

若action省略,隐含执行print,若pattern省略,则隐含为匹配正确。


语句根据换行符及分号判断为结束。

使用#进行注释


控制流包括

if(expr) statement

if( expr ) statement else statement

while ( expr) statement

do statement while (expr)

for(opt_expr ; opt_expr; opt_expr) statement

for(var in array) statement

continue

break


2,数据类型,转换和比较

有数值类型及字符串类型两种,其中,所有数字都用浮点数进行表示并完成计算。


3,正则表达式

expr ~ /r/

正则表达式用斜线括住,若expr服从该正则表达式,则为真,否则为假。不服从用“!~”表示。

/r/ {action}与$0 ~ /r/ {action}等价。


4,Records与Fields

一次读一行,即一个Records,并根据FS划分为Fields,$0表示整个Records,$1,$2,...,$NF分别代表对应的Field,内置变量NF为field的总数,高于NF的Fields设置为“”

NR及FNR每次增加1,所代表含义后文中有说明。

给上述内置变量赋值,会导致相关变量发生变化。


5,表达式和操作符

与C语言中操作符大部分一致,特殊的有in(array membership)、~(!~,matching)、$(field)。


6,数组

awk提供一维数组,使用A[1]或A["1"]访问元素

delete array[expr]会删除相应元素

if( ( i, j) in A ) print A[i, j]


7,内置变量

CONVFMT:数值到字符串的内部转换格式,默认=“%.6g”

ENVIRON:环境变量数组,var=value被存储为ENVIRON[var]=value

FILENAME:当前输入文件的名字

FNR:在FILENAME文件中的当前记录序号

FS:field分隔符,可以是正则表达式

NF:当前记录的field总数

NR:在所有输入流中的record number

OFMT:输出数值格式,默认为“%.6g”

OFS:输出时采用的field分隔符,默认为空格

ORS:输出时record分隔符,默认为换行符

RS:输入时的record分隔符,默认为换行

SUBSEP:used to build multiple array subscripts,默认为“\034”


8,内置函数

(1)字符串处理函数

gsub(r, s, t) gsub(r, s)

将t变量中匹配r的字符替换为s,若未指明t,则隐含为$0。最后返回替换数量。

index(s, t)

返回s中t第一次出现的位置,否则返回0,s的第一个字符下标为1

length(s)

返回s的长度

match(s,r)

返回s中r的最长匹配下标,无匹配返回0.

split(s,A,r)  split(s,A)

根据r将s划分至A,并返回fields数量,若未设定r,则使用FS

sprintf(format, expr-list)

根据format构造字符串

sub(r,s,t) sub(r,s)

只完成一次替换

substr(s,i,n) substr(s,i)

返回s由i至长度为n的子字符串

tolower(s)

toupper(s)

(2)数值处理函数

atan2(y,x)

cos(x)

exp(x)

int(x)

log(x)

rand()

sin(x)

sqrt()

srand(expr) srand()


9,输入与输出

输出有print与printf

print

输出$0

print expr1, expr2, ... , exprn

printf format, expr-list


输入为getline

getline

读入一行至$0

getline < file

从file中读入$0

getline var

读下条记录至var

getline var < file

从file中读取下一条record至var

command | getline

执行command,并从管道中读取一条record至$0

command | getline var

执行command,并从管道中读取一条record至var

getline遇到end-of-file时,返回0,错误返回-1,正确返回1


close(expr)

关闭与expr联系的file或pipe

fflush(expr)


system(expr)执行expr,并返回状态


10,用户自定义函数

语法为

function name (args) {statements}

可以包含return opt_expr


11,分割字符串,记录,文件


12,多行records的处理

13,程序执行

首先会执行BEGIN

然后逐个根据中间的pattern{action}处理各个record

其中可以通过

next

exit opt_expr

改变pattern级别的程序执行顺序,碰到next直接read下一条record,并从BEGIN之后继续执行;exit立即调用END对应的actions,opt_expr为退出码。

最后执行END


  • 2
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
This book is about a set of oddly named UNIX utilities, sed and awk. These utilities have many things in common, including the use of regular expressions for pattern matching. Since pattern matching is such an important part of their use, this book explains UNIX regular expression syntax very thoroughly. Because there is a natural progression in learning from grep to sed to awk, we will be covering all three programs, although the focus is on sed and awk. Sed and awk are tools used by users, programmers, and system administrators - anyone working with text files. Sed, so called because it is a stream editor, is perfect for applying a series of edits to a number of files. Awk, named after its developers Aho, Weinberger, and Kernighan, is a programming language that permits easy manipulation of structured data and the generation of formatted reports. This book emphasizes the POSIX definition of awk. In addition, the book briefly describes the original version of awk, before discussing three freely available versions of awk and two commercial ones, all of which implement POSIX awk. The focus of this book is on writing scripts for sed and awk that quickly solve an assortment of problems for the user. Many of these scripts could be called "quick-fixes." In addition, we'll cover scripts that solve larger problems that require more careful design and development. Scope of This Handbook Chapter 1, Power Tools for Editing, is an overview of the features and capabilities of sed and awk. Chapter 2, Understanding Basic Operations, demonstrates the basic operations of sed and awk, showing a progression in functionality from sed to awk. Both share a similar command-line syntax, accepting user instructions in the form of a script. Chapter 3, Understanding Regular Expression Syntax, describes UNIX regular expression syntax in full detail. New users are often intimidated by these strange expressions, used for pattern matching. It is important to master regular expression syntax to get the most from sed and awk. The pattern-matching examples in this chapter largely rely on grep and egrep. Chapter 4, Writing sed Scripts, begins a three-chapter section on sed. This chapter covers the basic elements of writing a sed script using only a few sed commands. It also presents a shell script that simplifies invoking sed scripts. Chapter 5, Basic sed Commands, and Chapter 6, Advanced sed Commands, divide the sed command set into basic and advanced commands. The basic commands are commands that parallel manual editing actions, while the advanced commands introduce simple programming capabilities. Among the advanced commands are those that manipulate the hold space, a set-aside temporary buffer. Chapter 7, Writing Scripts for awk, begins a five-chapter section on awk. This chapter presents the primary features of this scripting language. A number of scripts are explained, including one that modifies the output of the ls command. Chapter 8, Conditionals, Loops, and Arrays, describes how to use common programming constructs such as conditionals, loops, and arrays. Chapter 9, Functions, describes how to use awk's built-in functions as well as how to write user-defined functions. Chapter 10, The Bottom Drawer, covers a set of miscellaneous awk topics. It describes how to execute UNIX commands from an awk script and how to direct output to files and pipes. It then offers some (meager) advice on debugging awk scripts. Chapter 11, A Flock of awks, describes the original V7 version of awk, the current Bell Labs awk, GNU awk (gawk) from the Free Software Foundation, and mawk, by Michael Brennan. The latter three all have freely available source code. This chapter also describes two commercial implementations, MKS awk and Thomson Automation awk (tawk), as well as VSAwk, which brings awk-like capabilities to the Visual Basic environment. Chapter 12, Full-Featured Applications, presents two longer, more complex awk scripts that together demonstrate nearly all the features of the language. The first script is an interactive spelling checker. The second script processes and formats the index for a book or a master index for a set of books. Chapter 13, A Miscellany of Scripts, presents a number of user-contributed scripts that show different styles and techniques of writing scripts for sed and awk. Appendix A, Quick Reference for sed, is a quick reference describing sed's commands and command-line options. Appendix B, Quick Reference for awk, is a quick reference to awk's command-line options and a full description of its scripting language. Appendix C, Supplement for Chapter 12, presents the full listings for the spellcheck.awk script and the masterindex shell script described in Chapter 12.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值