《Linux Shell Scripting Cookbook》Linux常用命令笔记（三）

最新推荐文章于 2024-08-23 09:46:20 发布

CommiYou

最新推荐文章于 2024-08-23 09:46:20 发布

阅读量1.2k

点赞数

分类专栏： Linux学习笔记文章标签： linux sed awk regex cut

本文链接：https://blog.csdn.net/commiyou/article/details/16917987

版权

Linux学习笔记专栏收录该内容

9 篇文章 0 订阅

订阅专栏

一 cut

1、cut -f FIELD_LIST filename ，cut 用来根据列切割，每列又称为域(field)，不同域之间用逗号分隔。

$ cut -f 2,3 filename          # 取filename中每行的第2和第3列输出到stdout中。

2、Tab 键被用来作为默认分隔符，默认会打印不含分隔符的行，可以使用 -s 选项不输出。

可以使用 -d 选项来指定分隔符，如：

$ cat delimited_data.txt
No;Name;Mark;Percent
1;Sarath;45;90
2;Alex;49;98
3;Anu;45;90
$ cut -f2 -d";" delimited_data.txt
Name
Sarath
Alex
Anu

另外，需注意 cut 是以每个分隔符来分field，所以有重复分隔符的话，每个分隔符算一个field。

3、可以以 "-x N-M" 形式来指定切割一个范围内的字符串，其中x 可以为 b (bytes)、c（characters) 和 f（fields）。

如果需要多个不同范围，可以用 “," 隔开范围，默认不同范围之间输出是连续的，可以用 --output-delimiter 选项来分隔。

$ cat range_fields.txt
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxy
$ cut -c-2,4-5 range_fields.txt --output-delimiter ","      # 只输出前2个字符串以及第4~5个字符，以逗号隔开。
ab,de
ab,de
ab,de
ab,de

二 sed

1、 sed 是一种 stream editor，常用来进行文本替换。

2、sed 接受 stdin 或文件名作为输入，输出至 stdout。

cat file | sed 's/pattern/replace_string/'       # 使用管道作为输入
$ sed 's/text/replace/' file > newfile           # 使用重定向保存结果
$ mv newfile file
$ sed -i 's/text/replace/' file                  # 使用 -i 参数直接将修改写入源文件

3、需要注意的是，上面的命令只会替换每行第一次遇到的匹配字符串，如果想替换每行中全部匹配字符串，需加参数 g。

$ sed 's/pattern/replace_string/g' file

也可以从第N次出现时开始替换（还是以行为单位）：

$ cat cut.test
1 2 2 2
1  2 2
1   2 2
$ sed 's/2/3/2g' cut.test
1 2 3 3
1  2 3
1   2 3

3、在上述的sed命令中是作为分隔符使用的，我们可以用任何字符作为分隔符，当分隔符出现在pattern中时，可以用 "\" 转义。

sed 's:text:replace:g'           # ":" 作为分隔符
sed 's|text|replace|g'           # "|" 作为分隔符

4、使用 sed '/pattern/d' filename 来删除匹配字符串。

$ sed '/^$/d' file               # 删除空白行

5、可以使用 '&' 标记来表示前面匹配的字符串，比如：

$ echo this is an example | sed 's/\w\+/[&]/g'            # 这里的 \w 代表一个word中的字母，详见备注【1】,\+表示重复前面，其实属于正则表达式。
[this] [is] [an] [example]

6、可以使用 '\1' 标记来表示前面匹配字符串的子串。

$ echo this is digit 7 in a number | sed 's/digit \([0-9]\)/\1/'
this is 7 in a number

上诉命令用 7 替换了 digit 7，子串是7。\(pattern)\ 用来匹配子串，对于第一个匹配子串对应的标记是 '\1'，同理第二个匹配子串对应的是 '\2'。

$ echo seven EIGHT | sed 's/\([a-z]\+\) \([A-Z]\+\)/\2 \1/'
EIGHT seven

([a-z]\+\) 匹配第一个word， $[A-Z]\+$ 匹配第二个word，\1 和 \2 用来引用他们。

7、多个表达式的联合，下面两种方式是一样的：

$ sed 'expression' | sed 'expression'
$ sed 'expression; expression'

三 awk

1、awk 的格式如下：

awk ' BEGIN{ print "start" } pattern { commands } END{ print "end" } file

分为3块，BEGIN、END和一个带有正则匹配的common statements block，三则均可选。

其中 BEGIN中的块在 awk 最开始处理时执行一次，然后对从 file / stdin 中读入的每一行执行 pattern {command}；文件读完执行END块。

2、awk中的特殊变量有 NR （number of records，即当前行）、NF（number of fields，即当前列，默认空格分隔）、$0（当前行的内容）、$1（当前列的第一个field，$n类推）

$ echo -e "line1 f2 f3\nline2 f4 f5\nline3 f6 f7" | \
awk '{
print "Line no:"NR",No of fields:"NF, "$0="$0, "$1="$1,"$2="$2,"$3="$3 
}' 
Line no:1,No of fields:3 $0=line1 f2 f3 $1=line1 $2=f2 $3=f3 
Line no:2,No of fields:3 $0=line2 f4 f5 $1=line2 $2=f4 $3=f5 
Line no:3,No of fields:3 $0=line3 f6 f7 $1=line3 $2=f6 $3=f7

3、使用 -v 选项传递外部参数。

$ VAR=10000
$ echo | awk -v VARIABLE=$VAR'{ print VARIABLE }'      # 可以在块中直接引用变量，而不用加$
10000

4、使用 getline 来读取行。

$ seq 5 | awk 'BEGIN { getline; print "Read ahead first line", $0 } { 
print $0 }'
Read ahead first line 1
2
3
4
5

5、使用条件来选取行

$ awk 'NR < 5' # Line number less than 5
$ awk 'NR==1,NR==4' #Line numbers from 1-5
$ awk '/linux/' # Lines containing the pattern linux (we can specify 
regex)
$ awk '!/linux/' # Lines not containing the pattern linux

6、默认分隔符是空格，可以使用 -F 来指定其他值。

$ awk -F: '{ print $NF }' /etc/passwd
Or:
awk 'BEGIN { FS=":" } { print $NF }' /etc/passwd

可以使用 OFS='delimiter' 来指定输出时的分隔符

备注：

【1】Regular expressions 分为三种 the basic component of regular expressions （regex）， the POSIX character class, and meta characters。

regex是大家常见打那种，比如 ^、$、.、[]、[^]、[-]、?、+、*、()、{n}、{n,}、{n,m}、|、\ 等。

POSIX character class是类似于 [:...:] 的，比如 [:alnum:]、[:alpha:]、[:blank:]、[:digit:] 、[:lower:]、[:upper:] 、[:punct:]、[:space:] 等。

以上两种如果支持正则表达式，则一定支持，但是 meta characters 则不是。

Meta characters are a type of Perl-style regular expression that is supported by a subset of text processing utilities. Not all of the utilities will support the following notations.

Meta characters
Regex	Description	Example
\b	Word boundary	"\bcool\b" matches only "cool" not "coolant".
\B	Non-word boundary	"cool\B" matches "coolantand" not "cool"
\d	Single digit character	"b\db" matches "b2b" not "bcb".
\D	Single non-digit	"b\Db" matches "bcb" not "b2b".
\w	Single word character(alnum and _)	"\w" matches "1" or "a" not “&”
\W	Single non-word character	"\w" matches "&" not "1" or "a"
\n	Newline	"\n" Matches a new line
\s	Single whitespace	"x\sx" matches "xx" not "xx"
\S	Single non-space	"x\Sx" matches "xkx" not "xx".
\r	Carriage return	"\r" matches carriage return.

以上内容摘自《Linux Shell Scripting Cookbook》，有意研究也可参考维基百科Regular expre你ssion。

【2】关于sed中到底什么时候哪些字符需要前面加 '\' 来进行转义，可以参考这个stackexchange上得这个回答What characters do I need to escape when using sed in a sh script? 。Gilles得回答感觉挺好，首先要注意 shell （这里我们一般用单引号 ' 来将表达式引住），其次是注意BRE：$.*[\]^ ,这些字符在BRE中有特别含义，引用时需转义；另外还有其他情况，可以参考回答，挺详细的。