gnu awk notes

# The `awk` Language

 

- how to run

 

```awk

awk 'program' input-file1 input-file2 ...

```

```awk

awk -f program-file input-file1 input-file2 ...

```

 

program consists of

```awk

pattern { action }

pattern { action }

...

```

 

- base notes

 

- `$0`: current line

- patterns starting with `/` represents a regular expression

- either pattern or action can be omiteed, but nit both.

- pattern omitted: action is performed for every input line.

- action omitted: print all lines that match the pattern.

 

- `-f`

 

The -f option may be used more than once on the command line. If it is, awk reads its program source from all of the named files, as if they had been concatenated together into one big file.

 

- Command-Line Arguments

 

- all command-line arguments are available to awk in the `ARGV` array.

- command-line options and the program text are omitted from `ARGV`

 

- Environment Variables by `gawk`

 

- `AWKPATH`

 

- `@include`

 

## Regular Expression used by `awk`

 

- `~` and `!`

 

regex comparison operator. For example, the following expression is true if comparison match succeed:

```awk

exp ~ /regexp/

```

 

- `\`

 

escape character

 

- `^` and `$`

- `.`

 

match any singile character

 

- `[ ... ]` and `[ ^... ]`

 

option match, only one character

- `*`, `+`, `?`, `{n}`, `{n,}`, `{n,m}`

 

repeat operator

 

- word class

 

- `[:alnum:]`

- `[:alpha:]`

- `[:blank:]` space and tab

- `[:cntrl:]`

- `[:digit:]`

- `[:graph:]`

- `[:lower:]`

- `[:print:]`

- `[:punct:]`

- `[:space:]`

- `[:upper:]`

- `[:xdigit:]`

 

- summary

 

- written as `/.../`

- match expression using `~` and `!~`

 

## Reading input files

 

- The input is read in units called records, and is processed by the rules of your program one record at a time. By default, each record is one line. Each record is automatically split into chunks called fields.

 

- fields are separated by whitespace by default, can be referred by `$`

- `$0` refer to the whole input record.

- `NR`: the number of records read so far.

- `NF`: the number of fields

 

> it is a common error to try to change the field separators in a record simply by setting FS and OFS, and then expecting a plain ‘print’ or ‘print $0’ to print the modified record.

> But this does not work, because nothing was done to change the record itself. Instead, you must force the record to be rebuilt, typically with a statement such as ‘$1 = $1’, as described earlier.

 

- `FS`: specify the field separator, whitespace by default.

 

- `-F`: set field separator in command line.

- `RS=""`: set the blank lines to be the record separator, same as `RS="\n\n+"`

 

## Printing Output

 

- `print` with no arguments is equivalent to `print $0`. To print a blank line, use `print ""`.

- specify output field separator using `OFS`, the initial value is `" "`

- specify output record separator using `ORS`, the initial value is `\n`

- format-control letters

 

- start with `%` and end with a _format-control letter_.

- flags

- `-`: left-justify

- `+`: always supply a sign

- `0`: padding with `0` instead of spaces

 

- `print` and `printf` redirection

- `print items >> output-file`

_redirect and append_

 

## Expressions

 

- a regexp constants appear by itself, it has the same meaning as `'($0 ~ /foo/)'`

 

> expression `'match = /foo/'` assigns either 0 or 1 to variable match. It is equivalent to `'match = ($0 ~ /foo/)'`

 

- strongly typed regexp constants

- preceded by `'@'` sign:

 

```awk

re = @/foo/ # Regexp variable

```

 

- variables

- variables are initialized to the empty string, which is zero if converted to a number.

- assign variable on the command line

```awk

varible=text # set at the beginning of awk or in between input files.

# or

-v varible=text # at the very beginnig.

```

 

- conversion between numbers and strings

- number to string: concatenate number with empty string `""`.

- string to number: add zero to the string.

- strings that cannot be interpreted as valid unmber convert to zero.

- number are conerted using the `sprintf()` function with `CONVFMT` as the format specifier.

> `CONVFMT`'s default value is `"%.6g"`

- as a special case, if a number is an integer, the result of conversion is **_always_** an integer.

- example

```awk

awk 'BEGIN {OFMT="%2.1f"

CONVFMT="%1.3f"

a=12.23232

print a"";print a}'

```

 

- there is only one string operation: concatenation.

 

```awk

awk `BEGIN {print -12 "" -24}`

# equivalent to -12(" " - 24), so get output: -12-24

```

 

- Fields, `getline` input, `FILENAME`, `ARGV` elements, `ENVIRON` elements and the elements of an array created bu `match()`, `split()`, and `patsplit()` that are numeric strings have the **_strnum_** attribute. Otherwise, they have the _string_ attribute. Uninitialized variables also have the **_strnum_** attribute.

> the righthand operand of the `~` and `!~` operators may be either a regex constant`(/.../)` or an ordinary expression. In the latter case, the

> value of the expression as a string is used as a dynamic regex.

 

- boolean expressions

- Boolean expression have numeric values(1 for true, 0 for false) if the result of the boolean expression is stored in a variable or used in arithmetic.

 

# Chapter 7. 141

 

## Patter Elements

 

- patter elements class

- _`/regex/`_

- _`expression`_

match when its value is nonzero or non-null

- _`begpat, endpat`_

- `BEGIN`, `END`

- `BEGINEFILE`, `ENDFILE`

- _`empty`_

 

match every input record

 

- the expression in pattern is reevaluated each time the rule is tested against a new input record

- boolean expression are also commonly used as patterns. for example:

 

```awk

awk ’/edu/ && /li/’ mail-list

# match if the record contain 'edu' and `li`

```

- `begpat, endpat`

 

- when a record matchs `begpat`, the range pattern is turned on. as long as the range pattern stays turned on, it automatically matched every input read.

- If the record satisfies both conditions, then the action is executed for just that record.

- range patterns donot combine with other patterns

 

- input/ouput from `EBGIN` and `END` rules

- no input record for `BEGIN`

- `$0` and `NF` are undefined inside an `END` rule.

 

## Actions

 

- action is enclosed in braces ('{...}'), statement are separated by newlines or semicolons.

 

## Control Statements in Actions

 

- `if-else`

 

```awk

if (condition) then-body [else else-body]

```

 

- `for`

 

```awk

for (initialization; condition; increment)

body

```

- It isn’t possible to set more than one variable in the initialization part without using a multiple assignment statement such as ‘x = y = 0’.

 

- `switch`

 

```awk

switch (expression) {

case value or regular expression:

case-body

default:

default-body

}

```

 

- `next`

 

- the `next` statement forces `awk` to immediately stop processing the current record and go on to the next record. the rest fo the current rule's action won't be executed.

 

- `nextfile`

- stop processing the current data file

 

- `exit`

- when an `exit` statement is executed, as part of executing the `exit`, the `END` rule is executed

 

## Predefined Variables

 

- _`CONVFMT`_

- _`FIELDWIDTHS`_

- _`FPAT`_

> A regular expression (as a string) that tells gawk to create the fields based on text that matches the regular expression.

- _`FS`_

- _`OFMT`_

- _`OFS`_

- _`ORS`_

- _`RS`_

- _`ARGC, ARGV`_

- _`ARGIND`_

> The index in _`ARGV`_ of the current file being processed.

- _`ENVIRON`_

- _`FILENAME`_

- _`FNR`_

> The current record number in the current file

- _`NF`_

> The number of fields in the current input record

- _`NR`_

> The number of input records awk has processed since the beginning of the program’s execution

 

# Chapter 8. Arrays in _`AWK`_

 

## The Basics of Arrays

 

- any number or string, not just consecutive integers, may be used as an array index.

- each array in _`awk`_ is a collection of pairs—an index and its corresponding array element value

- When awk creates an array (e.g., with the split() built-in function), **_that array’s indices are consecutive integers starting at one_**.

- reffer to an array element

```awk

array[index-expression]

```

- _A reference to an element that does not exist automatically creates that array element, with the null string as its value._

- check if an element exists at a certain index:

```awk

# index in array

if (2 in frequencies)

print "Subscript 2 is present."

```

- loop array

```awk

for (indx in array)

body

```

- the awk implementation determines the order in which the array is traversed.

- set `PROCINFO["sorted_in"]` to specify the order in which _`awk`_ traverse an array [**_only for gawk_**]

- "@unsorted" _default_

- "@ind_str_asc"

- "@ind_num_asc"

- "@val_type_asc"

- and so on ...

 

## `delete` statement

 

```awk

delete array[index-expression]

```

- It is not an error to delete an element that does not exist.

- delete all elements:

```awk

delete array

```

 

## Multidimensional Arrays

 

- Multidimensional arrays are supported in awk through concatenation of indices into one string. The separator used for combiantion is the value of the built-in variable _`SUBSEP`_

 

# Functions

 

TODO

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值