Handy one-liners for awk Awk一句话手册

最新推荐文章于 2024-09-23 20:17:59 发布

yangyinbo

最新推荐文章于 2024-09-23 20:17:59 发布

阅读量1.7k

点赞数

分类专栏： Unix/Linux 文章标签： unix dos 正则表达式 file whitespace each

Unix/Linux 专栏收录该内容

14 篇文章 0 订阅

订阅专栏

在ricky.zhu的博客中发现Handy one-liners for awk，翻译一下，理解了一下感觉不错，不过其中还有不明白的地方

HANDY ONE-LINERS FOR AWK 22 July 2003
compiled by Eric Pement version 0.22
Latest version of this file is usually at:

http://www.student.northpark.edu/pemente/awk/awk1line.txt

USAGE:

用法：

Unix 下：

Unix: awk ‘/pattern/ {print “$1″}’ # standard Unix shells

Dos/win 下（头一次知道还能在 win 下用）：
DOS/Win: awk ‘/pattern/ {print “$1″}’ # okay for DJGPP compiled
awk “/pattern/ {print /”$1/”}” # required for Mingw32

DJGPP 和 Mingw32 是刚认识的两个新东西，不错。

Most of my experience comes from version of GNU awk (gawk) compiled for
Win32. Note in particular that DJGPP compilations permit the awk script
to follow Unix quoting syntax ‘/like/ {“this”}’. However, the user must
know that single quotes under DOS/Windows do not protect the redirection
arrows (< , >) nor do they protect pipes (|). Both are special symbols
for the DOS/CMD command shell and their special meaning is ignored only
if they are placed within “double quotes.” Likewise, DOS/Win users must
remember that the percent sign (%) is used to mark DOS/Win environment
variables, so it must be doubled (%%) to yield a single percent sign
visible to awk.

我大多数 win32 上编译 awk 的经验来自 GNU 版本的 awk 。特别要指出 DJGPP 编译器允许 unix 平台上引号语法的 awk 脚本。但用户必须了解在 DOS/Windows 平台下单引号既不对箭头重定向（ < 、 > ）进行保护也不对管道重定向进行保护。对 DOS/CMD 命令行脚本来说两种重定向都是特殊的并且这种特殊意味着如果它们在双引号内部将被忽略。同样的， DOS/Win 用户必须记住百分号（ % ）用来标记 DOS/Win 下的环境变量，因此如果想在 awk 中使百分号可见必须使用 %% 。

If I am sure that a script will NOT need to be quoted in Unix, DOS, or
CMD, then I normally omit the quote marks. If an example is peculiar to
GNU awk, the command ‘gawk’ will be used. Please notify me if you find
errors or new commands to add to this list (total length under 65
characters). I usually try to put the shortest script first.

如果我真的确信脚本在 Unix, DOS, 或是 CMD 中不需要被引号引起来，那么我将忽略这些引号标记。如果一个例子是特殊的 GNU awk ，将会使用 gawk 以作区别。请通知我如果你找到一些错误或是想加入一些新命令到这个列表（字数不超过 65 个）。我通常将最短小精悍的脚本放到这里。

FILE SPACING:

# double space a file
awk ‘1;{print ""}’
awk ‘BEGIN{ORS=" /n/n "};1′

理解：方法一的分号不能忽略，具体原因从模式与动作理解：

标准用法为

awk ‘/pattern/ {print “$1″}’

模式这个位置只要求一个真假，不一定要是正则；

1 是模式匹配的结果， 0 是假 , 非 0 是真 , 它的默认动作是 print ，而 print 的默认参数是 $0 。

若写作

awk ‘1 {print }’

awk ‘1 {print “”}’

awk ‘1 ;{print }’

三个命令会大不相同

# double space a file which already has blank lines in it. Output file
# should contain no more than one blank line between lines of text.
# NOTE: On Unix systems, DOS lines which have only CRLF ( /r/n ) are
# often treated as non-blank, and thus ‘NF’ alone will return TRUE.

使已存在空行的文件每行变双行。输出的文件应该有且只有一行空行在两行文本之间。提示：在 unix 系统中， DOS 系统中只包含回车换行（ /t/n ）的行经常被当做非空白，所以单独的“ NF ”返回 true 。
awk ‘NF{print $0 " /n "}’

理解： NF 匹配到空行返回 0, 所以并不执行动作。

# triple space a file
awk ‘1;{print " /n "}’

理解：前面有 awk ‘1;{print ""}’ 产生一行空行，现在产生两生空行。不太明白。

NUMBERING AND CALCULATIONS:

编号和计算

# precede each line by its line number FOR THAT FILE ( left alignment ) .
# Using a tab ( /t ) instead of space will preserve margins.

给文件中的每一行前面加上此文件中所处的行号（左对齐）。用 TAB 键代替空格以维持边界。
awk ‘{print FNR " /t " $0}’ files*

# precede each line by its line number FOR ALL FILES TOGETHER, with tab.

给文件中的每一行前面加上的所有文件中总行号，用 TAB 键分隔。
awk ‘{print NR " /t " $0}’ files*

这两个还是比较有意思的。

# number each line of a file ( number on left, right-aligned )
# Double the percent signs if typing from the DOS command prompt.

为一个文件每一行编号（编号在左边，右对齐）如果是在 DOS 命令行提示符下百分号键入两个。
awk ‘{printf("%5d : %s /n ", NR,$0)}’

# number each line of file , but only print numbers if line is not blank
# Remember caveats about Unix treatment of /r ( mentioned above )

为一个文件每一行编号，但只打印非空行的行号，记住前面提到的 unix 关于处理 /r 的警告。
awk ‘NF{$0=++a " :" $0};{print}’
awk ‘{print (NF? ++a " :" :"") $0}’

理解：方法一通过给 $0 赋值，使得选打印行号，再打印行本身

方法二通过对此行判断是否为空决定是否打印行号，再打印行本身

# count lines ( emulates "wc -l" )

给行计数
awk ‘END{print NR}’

# print the sums of the fields of every line

打印每一列中所有行总和
awk ‘{s=0; for (i=1; i< =NF; i++) s=s+$i; print s}’

# add all fields in all lines and print the sum

计算所有行所有列的总和，最后打印
awk ‘{for (i=1; i<=NF; i++) s=s+$i}; END{print s}’

# print every line after replacing each field with its absolute value

打印每一行每一列的绝对值
awk ‘{for (i=1; i<=NF; i++) if ($i < 0) $i = -$i; print }’
awk ‘{for (i=1; i<=NF; i++) $i = ($i < 0) ? -$i : $i; print }’

# print the total number of fields ( "words" ) in all lines

打印在所有文件中域（单词）的总数
awk ‘{ total = total + NF }; END {print total}’ file

# print the total number of lines that contain "Beth"

打印包含“ Beth ”的总行数
awk ‘/Beth/{n++}; END {print n+0}’ file

# print the largest first field and the line that contains it
# Intended for finding the longest string in field # 1

打印出第一列中数值最大的那一个域，以及包含此域的行
awk ‘$1 > max {max=$1; maxline=$0}; END{ print max, maxline}’

# print the number of fields in each line, followed by the line

打印出每一行域的数量
awk ‘{ print NF ":" $0 } ‘

# print the last field of each line

打印出每一行的最后一个域
awk ‘{ print $NF }’

# print the last field of the last line

打印出最后一行的最后一个域
awk ‘{ field = $NF }; END{ print field }’

# print every line with more than 4 fields

打印出超过 4 个域的行
awk ‘NF > 4′

# print every line where the value of the last field is > 4

打印出最后一个域数值大于 4 的行
awk ‘$NF > 4′

TEXT CONVERSION AND SUBSTITUTION:

文本的转换和替换

# IN UNIX ENVIRONMENT: convert DOS newlines ( CR/LF ) to Unix format

在 UNIX 环境下：转换 DOS 换行符到 UNIX 格式
awk ‘{sub(/ /r $/,"");print}’ # assumes EACH line ends with Ctrl-M

假定每一行以 ctrl-M 结束

# IN UNIX ENVIRONMENT: convert Unix newlines ( LF ) to DOS format

在 UNIX 环境下：转换 UNIX 换行符到 DOS 格式
awk ‘{sub(/$/," /r ");print}

# IN DOS ENVIRONMENT: convert Unix newlines (LF) to DOS format

在 DOS 环境下：转换 UNIX 换行符（ LF ）到 DOS 格式
awk 1

# IN DOS ENVIRONMENT: convert DOS newlines (CR/LF) to Unix format
# Cannot be done with DOS versions of awk, other than gawk:

在 DOS 环境下：转换 DOS 换行符（ CR/LF ）到 UNIX 格式，不能用 dos 版本的 awk ，可以用 gawk

gawk -v BINMODE="w" ‘ 1 ‘ infile >outfile

# Use "tr" instead.
tr -d /r <infile>outfile # GNU tr version 1.22 or higher

# delete leading whitespace (spaces, tabs) from front of each line
# aligns all text flush left

删除每一行开头的空白（空格， tab ），使文本左对齐
awk ‘ { sub ( /^ [ /t ] +/, "" ) ; print } ‘

# delete trailing whitespace (spaces, tabs) from end of each line

删除每一行末尾的空白（空格， tab ）
awk ‘ { sub ( / [ /t ] +$/, "" ) ;print } ‘

# delete BOTH leading and trailing whitespace from each line

删除每一行两头的空白
awk ‘ { gsub ( /^ [ /t ] +| [ /t ] +$/, "" ) ;print } ‘
awk ‘ { $ 1 =$ 1 ;print } ‘ # also removes extra space between fields

理解：方法二

# insert 5 blank spaces at beginning of each line (make page offset)

在每一行开头插入 5 个空白（整页偏移）
awk ‘ { sub ( /^/, " " ) ;print } ‘

# align all text flush right on a 79-column width

使所有文本以 79 列的宽度右对齐
awk ‘ { printf "%79s /n " , $ 0 } ‘ file*

# center all text on a 79-character width

居中所有文本以 79 个字符宽度
awk ‘ { l= length () ; s= int (( 79 -l ) / 2 ) ; printf "%" ( s+l ) "s /n " ,$ 0 } ‘ file*

# substitute (find and replace) "foo" with "bar" on each line

在每一行中用 bar 替换 foo
awk ‘ { sub ( /foo/, "bar" ) ;print } ‘ # replaces only 1st instance
gawk ‘ { $ 0 =gensub ( /foo/, "bar" , 4 ) ;print } ‘ # replaces only 4th instance
awk ‘ { gsub ( /foo/, "bar" ) ;print } ‘ # replaces ALL instances in a line

# substitute "foo" with "bar" ONLY for lines which contain "baz"

在包含 baz 的每一行中用 bar 替换 foo
awk ‘ /baz/ { gsub ( /foo/, "bar" )} ; { print } ‘

# substitute "foo" with "bar" EXCEPT for lines which contain "baz"

在不包含 baz 的每一行中用 bar 替换 foo
awk ‘ !/baz/ { gsub ( /foo/, "bar" )} ; { print } ‘

# change "scarlet" or "ruby" or "puce" to "red"

将 "scarlet" 或 "ruby" 或 "puce" 替换为 red
awk ‘ { gsub ( /scarlet|ruby|puce/, "red" ) ; print } ‘

# reverse order of lines (emulates "tac")

使文件中的行反序
awk ‘ { a [ i++ ] =$ 0 } END { for ( j= i -1 ; j>= 0 ; ) print a [ j– ] } ‘ file*

理解：用数组存储后反序输出，比较简单，好像不能超过 1023 行？

# if a line ends with a backslash, append the next line to it
# (fails if there are multiple lines ending with backslash…)

如果以反斜杠结尾的行，将其下一行追加到其后面（如果有多行以反斜杠结尾会失败）
awk ‘ ///$/ { sub ( ///$/, "" ) ; getline t; print $ 0 t; next } ; 1 ‘ file*

理解：匹配到以反斜杠结尾的行后将反斜杠替换，再取到这一行的下一行，输出，开始进行下一个循环。如果连续两行以反斜杠结尾，则后面那一行的反斜杠还会出现在输出结果中。

# print and sort the login names of all users

打印出排序后的用户名
awk -F ":" ‘ { print $ 1 | "sort" } ‘ /etc/passwd

# print the first 2 fields, in opposite order, of every line

打印出每一行的前两个域，反序输出
awk ‘ { print $ 2 , $ 1 } ‘ file

# switch the first 2 fields of every line

交换每一行的前两个域
awk ‘ { temp = $ 1 ; $ 1 = $ 2 ; $ 2 = temp } ‘ file

理解：一个 c 语言中很容易想到的交换变量

# print every line, deleting the second field of that line

打印删除后第二个域的每一行
awk ‘ { $ 2 = "" ; print } ‘

# print in reverse order the fields of every line

反转每一行中所有域的顺序然后输出此行
awk ‘ { for ( i= NF; i> 0 ; i– ) printf ( "%s " ,i ) ;printf ( " /n " )} ‘ file

理解：不再用 $1 $2 $3 ...$n 而是用 $n...$2 $1, 所以就反转了

# remove duplicate, consecutive lines (emulates "uniq")

删除重复的连续的行（模拟 uniq ）
awk ‘ a !~ $ 0 ; { a= $ 0 } ‘

理解：对于第一次执行，变量 a 为空，是否匹配， a 不匹配 $0 则打印，此后将 a 赋值为 $0, 此后 a 有值，再匹配下一行，如果第二行与第一行相同，则不打印， a 又被重新赋值。

# remove duplicate, nonconsecutive lines

删除重复不连续的行。
awk ‘ ! a [ $ 0 ] ++ ‘ # most concise script
awk ‘ ! ( $ 0 in a ) { a [ $ 0 ] ;print } ‘ # most efficient script

理解：最简洁的脚本！果然！看来数组还没有完全学会。数组的数据 ++ ？

最效率的脚本：此行在数组中不？再取非，在则不打印，不在则打印。

# concatenate every 5 lines of input, using a comma separator
# between fields

输入的行内以逗号为分隔符，让每五行连成一行
awk ‘ ORS= %NR% 5 ? "," : " /n "‘ file

SELECTIVE PRINTING OF CERTAIN LINES:

选择性输出指定行

# print first 10 lines of file (emulates behavior of "head")

输出前 10 行
awk ‘ NR < 11 ‘

# print first line of file (emulates "head -1")

输出第一行
awk ‘ NR> 1 { exit } ; 1 ‘

# print the last 2 lines of a file (emulates "tail -2")

输出最后两行
awk ‘ { y= x " /n " $ 0 ; x= $ 0 } ;END { print y } ‘

理解：第一行赋值给 y ， x 也是第一行，下一行， y 就有了第一第二行， x 有了第二行，依次到最后， x 有最后一行， y 有倒数第二行和最后一行。

# print the last line of a file (emulates "tail -1")

打印最后一行
awk ‘ END { print } ‘

理解：直接扫描到最后一行，执行 END

# print only lines which match regular expression (emulates "grep")

匹配到就打印
awk ‘ /regex/ ‘

# print only lines which do NOT match regex (emulates "grep -v")

没有包含的行就打印
awk ‘ !/regex/ ‘

# print the line immediately before a regex, but not the line
# containing the regex

打印匹配到的那一行之前的一行，但并不打印匹配到的行
awk ‘ /regex/ { print x } ; { x= $ 0 } ‘
awk ‘ /regex/ { print ( x= = "" ? "match on line 1" : x )} ; { x= $ 0 } ‘

# print the line immediately after a regex, but not the line
# containing the regex

打印匹配到的那一行之后的一行，但并不打印匹配到的行
awk ‘ /regex/ { getline;print } ‘

# grep for AAA and BBB and CCC (in any order)

任意顺序的匹配到后打印
awk ‘ /AAA/; /BBB/; /CCC/ ‘

# grep for AAA and BBB and CCC (in that order)

以指定顺序进行匹配打印
awk ‘ /AAA.*BBB.*CCC/ ‘

# print only lines of 65 characters or longer

打印那些长度大于等于 65 个字符的行
awk ‘ length > 64 ‘

# print only lines of less than 65 characters

只打印那些长度小于 65 字符的行
awk ‘ length < 64 ‘

# print section of file from regular expression to end of file

打印从开始匹配到文件末尾的部分
awk ‘ /regex/, 0 ‘
awk ‘ /regex/,EOF ‘

理解：可以打印两个部分之间的，由于一直没有匹配到结束的，所以就从第一次匹配直到结束都打印了

# print section of file based on line numbers (lines 8-12, inclusive)

打印 8-12 行
awk ‘ NR= = 8 , NR= = 12 ‘

# print line number 52

打印 52 行
awk ‘ NR= = 52 ‘
awk ‘ NR= = 52 { print;exit } ‘ # more efficient on large files

# print section of file between two regular expressions (inclusive)

打印这匹配到的两部分间的内容（包含匹配行）
awk ‘ /Iowa/,/Montana/ ‘ # case sensitive

SELECTIVE DELETION OF CERTAIN LINES:

选择性的删除指定行

# delete ALL blank lines from a file (same as "grep ‘ . ‘ ")

删除所以空行
awk NF
awk ‘ /./ ‘

CREDITS AND THANKS:

编撰人员和感谢：

Special thanks to Peter S. Tillier for helping me with the first release
of this FAQ file.

特别感谢 Peter S. Tillier 在发布这份 FAQ 文档第一版上给予我的帮助。

For additional syntax instructions, including the way to apply editing
commands from a disk file instead of the command line, consult:

对于额外的语法指令，包括在磁盘文件中编辑命令而非在命令行中，请查阅：

“sed & awk, 2nd Edition,” by Dale Dougherty and Arnold Robbins
O’Reilly, 1997
“UNIX Text Processing,” by Dale Dougherty and Tim O’Reilly
Hayden Books, 1987
“Effective awk Programming, 3rd Edition.” by Arnold Robbins
O’Reilly, 2001

Dale Dougherty 和 Arnold Robbins 编写的《 sed &awk 第二版》， O'Reilly 出版社， 1997 年出版

Dale Dougherty 和 Tim O'Reilly 编写的《 UNIX 文本处理》 Hayden 图书 , 1987 年出版

Arnold Robbins 编写的《高效 awk 编程第三版》 O'Reilly 出版社， 2001 年出版

To fully exploit the power of awk, one must understand “regular
expressions.” For detailed discussion of regular expressions, see
“Mastering Regular Expressions, 2d edition” by Jeffrey Friedl
(O’Reilly, 2002).

为全面的探索功能强大的 awk, 首先必须理解正则表达式。对于正则表达式的详细讨论，可以参考 Jeffrey Friedl
编写的《精通正则表达式第二版》 O'Reilly 出版社， 2002 年出版。

The manual (“man”) pages on Unix systems may be helpful (try “man awk”,
“man nawk”, “man regexp”, or the section on regular expressions in “man
ed”), but man pages are notoriously difficult. They are not written to
teach awk use or regexps to first-time users, but as a reference text
for those already acquainted with these tools.

在 Unix 系统上 man 手册可能会很有帮助（尝试一下 man awk,man nawk,man regexp, 或是 man ed 中的正则表达式部分），但是 man 文册的难于理解是臭名昭著的。他们的撰写目的不是教 awk 的用法也不是给首次使用正则表达式的人所准备，但作为参考这份文档对于已在使用 awk 的人来说是很有价值的。

USE OF ‘/t’ IN awk SCRIPTS: For clarity in documentation, we have used
the expression ‘/t’ to indicate a tab character (0×09) in the scripts.
All versions of awk, even the UNIX System 7 version should recognize
the ‘/t’ abbreviation.

在 AWK 脚本中使用 /t: 在脚本中为了清楚的说明，我们用表达式 /t 来指明一个制表符（ 0x09 ）。 AWK 所有版本不认识 , 即使是 UNIX 系统第七版也认识 /t 这一简写。