shsell编程之正则表达式与文本处理器-CSDN博客

本文链接：https://blog.csdn.net/2503_91074196/article/details/147855442

定义表达式概述

正则表达式的定义

正则表达式又称正规表达式、常规表达式。在代码中常简写为 regex 、 regexp 或 RE 。正则表达式是使用单个字符串来描述、匹配一系列符合某个句法规则的字符串，简单来说，是一种匹配字符串的方法，通过一些特殊符号，实现快速查找、删除、替换某个特定字符串。正则表达式是由普通字符与元字符组成的文字模式。模式用于描述在搜索文本时要匹配的一个或多个字符串。正则表达式作为一个模板，将某个字符模式与所搜索的字符串进行匹配。
其中普通字符包括大小写字母、数字、标点符号及一些其他符号，元字符则是指那些在正则表达式中具有特殊意义的专用字符，可以用来规定其前导字符（即位于元字符前面的字符）在目标对象中的出现模式。
正则表达式一般用于脚本编程与文本编辑器中。很多文本处理器与程序设计语言均支持正则表达式，例如 Linux 系统中常见的文本处理器（ grep 、 egrep 、 sed 、 awk ）以及应用比较广泛的 Python 语言。正则表达式具备很强大的文本匹配功能，能够在文本海洋中快速高效地处理文本

正则表达式用途

对于一般计算机用户来说，由于使用到正则表达式的机会不多，所以无法体会正则表达式的魅力，而对于系统管理员来说，正则表达式则是必备技能之一。
正则表达式对于系统管理员来说是非常重要的，系统运行过程中会产生大量的信息，这些信息有些是非常重要的，有些则仅是告知的信息。身为系统管理员如果直接看这么多的信息数据，无法快速定位到重要的信息，如 “ 用户账号登录失败 ”“ 服务启动失败 ” 等信息。这时可以通过正则表达式快速提取 “ 有问题 ” 的信息。如此一来，可以将运维工作变得更加简单、方便。
目前很多软件也支持正则表达式，最常见的就是邮件服务器。在 Internet 中，垃圾 / 广告邮件经常会造成网络塞车，如果在服务器端就将这些问题邮件提前剔除的话，客户端就会减少很多不必要的带宽消耗。而目前常用的邮件服务器 postfix 以及支持邮件服务器的相关分析软件都支持正则表达式的对比功能。将来信的标题、内容与特殊字符串进行对比，发现问题邮件就过滤掉。
除邮件服务器之外，很多服务器软件都支持正则表达式。虽然这些软件都支持正则表达式，不过字符串的对比规则还需要系统管理员来添加，因此正则表达式是系统管理员必须掌握的技能之一

基础正则表达式

正则表达式的字符串表达方法根据不同的严谨程度与功能分为基本正则表达式与扩展正则表达式。基础正则表达式是常用正则表达式最基础的部分。在 Linux 系统中常见的文件处理工具中 grep 与 sed 支持基础正则表达式，而 egrep 与 awk 支持扩展正则表达式。掌握基础正则表达式的使用方法，首先必须了解基本正则表达式所包含元字符的含义，下面通过grep 命令以举例的方式逐个介绍

基础正则表达示例

下面的操作需要提前准备一个名为 test.txt 的测试文件，文件具体内容如下所示

[root@localhost ~]# cat test.txt
he was short and fat.
he was weating a blue polo shirt with black pants.
The home of Football on BBC Sport online.
the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.
PI=3.14
a wood cross!
Actions speak louder than words

#woood #
#woooooooood #
AxyzxyzxyzxyzC
I bet this place is really spooky late at night!
Misfortunes never come alone/single.
I shouldn't have lett so tast.

查找特定字符

查找特定字符非常简单，如执行以下命令即可从 test.txt 文件中查找出特定字符 “the” 所在位置。其中 “-n” 表示显示行号、 “-i” 表示不区分大小写。命令执行后，符合匹配标准的字符，字体颜色会变为红色（本章中全部通过加粗显示代替）

[root@localhost ~]# grep -n 'the' test.txt
4:the tongue is boneless but it breaks bones.12!
5:google is the best tools for search keyword.

若反向选择，如查找不包含 “the” 字符的行，则需要通过 grep 命令的 “-v” 选项实现，并配合 “-n” 一起使用显示行号。

[root@localhost ~]# grep -n 'the' test.txt
4:the tongue is boneless but it breaks bones.12!
5:google is the best tools for search keyword.
[root@localhost ~]# grep -vn 'the' test.txt
1:he was short and fat.
2:he was weating a blue polo shirt with black pants.
3:The home of Football on BBC Sport online.
6:PI=3.14
7:a wood cross!
8:Actions speak louder than words
9:
10:#woood #
11:#woooooooood #
12:AxyzxyzxyzxyzC
13:I bet this place is really spooky late at night!
14:Misfortunes never come alone/single.
15:I shouldn't have lett so tast.
16:

利用中括号“[]”来查找集合字符

想要查找 “shirt” 与 “short” 这两个字符串时，可以发现这两个字符串均包含 “sh” 与 “rt” 。此时执行以下命令即可同时查找到 “shirt” 与 “short” 这两个字符串，其中 “[]” 中无论有几个字符，都仅代表一个字符，也就是说 “[io]” 表示匹配 “i” 或者 “o”

[root@localhost ~]# grep -n 'sh[io]rt' test.txt
1:he was short and fat.
2:he was weating a blue polo shirt with black pants.

若要查找包含重复单个字符 “oo” 时，只需要执行以下命令即可

[root@localhost ~]# grep -n 'oo' test.txt
3:The home of Football on BBC Sport online.
5:google is the best tools for search keyword.
7:a wood cross!
10:#woood #
11:#woooooooood #
13:I bet this place is really spooky late at night!

在上述命令的执行结果中发现 “woood” 与 “wooooood” 也符合匹配规则，二者均包含 “w” 。其实通过执行结果就可以看出，符合匹配标准的字符加粗显示，而上述结果中可以得知，“#woood #” 中加粗显示的是 “ooo” ，而 “oo” 前面的 “o” 是符合匹配规则的。同理 “#woooooood #”也符合匹配规则。
若不希望 “oo” 前面存在小写字母，可以使用 “grep -n‘[^a-z]oo’test.txt” 命令实现，其中 “a-z”表示小写字母，大写字母则通过 “A-Z” 表示

[root@localhost ~]# grep -n '[^a-z]oo' test.txt
3:The home of Football on BBC Sport online.

若查找 “oo” 前面不是 “w” 的字符串，只需要通过集合字符的反向选择 “[^]” 来实现该目的。例如执行 “grep -n‘[^w]oo’test.txt” 命令表示在 test.txt 文本中查找 “oo” 前面不是 “w” 的字符串。

[root@localhost ~]# grep -n '[^w]oo' test.txt
3:The home of Football on BBC Sport online.
5:google is the best tools for search keyword.
10:#woood #
11:#woooooooood #
13:I bet this place is really spooky late at night!

查找包含数字的行可以通过“grep -n‘[0-9]’test.txt”命令来实现。

[root@localhost ~]# grep -n '[0-9]' test.txt
4:the tongue is boneless but it breaks bones.12!
6:PI=3.14

查找行首“^”与行尾字符“$”

基础正则表达式包含两个定位元字符： “^” （行首）与 “$” （行尾）。在上面的示例中，查询 “the” 字符串时出现了很多包含 “the” 的行，如果想要查询以 “the” 字符串为行首的行，则可以通过 “^” 元字符来实现

[root@localhost ~]# grep -n '^the' test.txt

4:the tongue is boneless but it breaks bones.12!

查询以小写字母开头的行可以通过 “^[a-z]” 规则来过滤，查询大写字母开头的行则使用“^[A-Z]” 规则，若查询不以字母开头的行则使用 “^[^a-zA-Z]” 规则。

[root@localhost ~]# grep -n '[a-z]' test.txt
1:he was short and fat.
2:he was weating a blue polo shirt with black pants.
3:The home of Football on BBC Sport online.
4:the tongue is boneless but it breaks bones.12!
5:google is the best tools for search keyword.
7:a wood cross!
8:Actions speak louder than words
10:#woood #
11:#woooooooood #
12:AxyzxyzxyzxyzC
13:I bet this place is really spooky late at night!
14:Misfortunes never come alone/single.
15:I shouldn't have lett so tast.

“^” 符号在元字符集合 “[]” 符号内外的作用是不一样的，在 “[]” 符号内表示反向选择，在 “[]”符号外则代表定位行首。反之，若想查找以某一特定字符结尾的行则可以使用 “$” 定位符。例如，执行以下命令即可实现查询以小数点（ . ）结尾的行。因为小数点（ . ）在正则表达式中也是一个元字符（后面会讲到），所以在这里需要用转义字符 “\” 将具有特殊意义的字符转化成普通字符。

[root@localhost ~]# grep -n '\.$' test.txt
1:he was short and fat.
2:he was weating a blue polo shirt with black pants.
3:The home of Football on BBC Sport online.
5:google is the best tools for search keyword.
14:Misfortunes never come alone/single.
15:I shouldn't have lett so tast.

当查询空白行时，执行“grep -n‘^$’test.txt”命令即可

[root@localhost ~]# grep -n '^$' test.txt
9:
16:

**查找任意一个字符“.”与重复字符“*”**

前面提到，在正则表达式中小数点（ . ）也是一个元字符，代表任意一个字符。例如执行以下命令就可以查找 “w??d” 的字符串，即共有四个字符，以 w 开头 d 结尾。

[root@localhost ~]# grep -n 'w..d' test.txt

5:google is the best tools for search keyword.
7:a wood cross!
8:Actions speak louder than words

在上述结果中， “wood” 字符串 “w..d” 匹配规则。若想要查询 oo 、 ooo 、 ooooo 等资料，则需要使用星号（ * ）元字符。但需要注意的是， “*” 代表的是重复零个或多个前面的单字符。“o*” 表示拥有零个（即为空字符）或大于等于一个 “o” 的字符，因为允许空字符，所以执行 “grep-n 'o*' test.txt” 命令会将文本中所有的内容都输出打印。如果是 “oo*” ，则第一个 o 必须存在，第二个 o 则是零个或多个 o ，所以凡是包含 o 、 oo 、 ooo 、 ooo ，等的资料都符合标准。同理，若查询包含至少两个 o 以上的字符串，则执行 “grep -n 'ooo*' test.txt” 命令即可。

[root@localhost ~]# grep -n 'ooo*' test.txt

3:The home of Football on BBC Sport online.

5:google is the best tools for search keyword.

7:a wood cross!

10:#woood #

11:#woooooooood #

13:I bet this place is really spooky late at night!

查询以 w 开头 d 结尾，中间包含至少一个 o 的字符串

[root@localhost ~]# grep -n 'woo*d' test.txt
7:a wood cross!
10:#woood #
11:#woooooooood #

执行以下命令即可查询以 w 开头 d 结尾，中间的字符可有可无的字符串。

[root@localhost ~]# grep -n 'w.*d' test.txt
1:he was short and fat.
5:google is the best tools for search keyword.
7:a wood cross!
8:Actions speak louder than words
10:#woood #
11:#woooooooood #

执行以下命令即可查询任意数字所在行。

[root@localhost ~]# grep -n '[0-9][0-9]*' test.txt
4:the tongue is boneless but it breaks bones.12!
6:PI=3.14

查找连续字符范围“{}”

在上面的示例中，使用了 “.” 与 “*” 来设定零个到无限多个重复的字符，如果想要限制一个范围内的重复的字符串该如何实现呢？例如，查找三到五个 o 的连续字符，这个时候就需要使用基础正则表达式中的限定范围的字符 “{}” 。因为 “{}” 在 Shell 中具有特殊意义，所以在使用 “{}” 字符时，需要利用转义字符 “\” ，将 “{}” 字符转换成普通字符。 “{}” 字符的使用方法如下所示。
查询两个 o 的字符。

[root@localhost ~]# grep -n 'o\{2\}' test.txt
3:The home of Football on BBC Sport online.
5:google is the best tools for search keyword.
7:a wood cross!
10:#woood #
11:#woooooooood #
13:I bet this place is really spooky late at night!

查询以 w 开头以 d 结尾，中间包含 2～5 个 o 的字符串。

[root@localhost ~]# grep -n 'wo\{2,5\}d' test.txt
7:a wood cross!
10:#woood #

查询以 w 开头以 d 结尾，中间包含 2 个或 2 个以上 o 的字符串

[root@localhost ~]# grep -n 'wo\{2,\}d' test.txt
7:a wood cross!
10:#woood #
11:#woooooooood #

元字符总结

通过上面几个简单的示例，可以了解到常见的基础正则表达式的元字符主要包括以下几个

符号	描述	示例
`^`	匹配字符串的开头；在字符类中作为第一个字符时，表示取反。	`^abc` 匹配以 `abc` 开头的字符串；`[^a-z]` 匹配非小写字母的字符。
`$`	匹配字符串的结尾。	`abc$` 匹配以 `abc` 结尾的字符串。
`.`	匹配除换行符外的任意单个字符。	`a.b` 匹配 `a0b`、`a*b` 等（中间为任意字符）。
`\`	转义字符，将元字符转为普通字符或表示预定义字符类。	`\.` 匹配点号；`\d` 匹配数字。
`*`	匹配前面的元素零次或多次。	`a` 匹配空字符串、`a`、`aa` 等；`abc` 匹配 `ac`、`abc` 等。
`[]`	字符类，匹配方括号内的任意一个字符。	`[abc]` 匹配 `a`、`b` 或 `c`；`[a-z]` 匹配任意小写字母。
`[^ ]`	否定字符类，匹配不在方括号内的任意一个字符。	`[^0-9]` 匹配非数字字符。
`[n1-n2]`	字符范围，匹配指定范围内的任意一个字符。	`[0-9]` 匹配数字 0-9；`[A-Z]` 匹配大写字母。
`{n}`	匹配前面的元素恰好 `n` 次。	`a{2}` 匹配 `aa`；`(ab){2}` 匹配 `abab`。
`{n,}`	匹配前面的元素至少 `n` 次。	`a{2,}` 匹配 `aa`、`aaa` 等。
`{n,m}`	匹配前面的元素 `n` 到 `m` 次（包含 `n` 和 `m`）。	`a{2,4}` 匹配 `aa`、`aaa` 或 `aaaa`。

扩展正则表达式

通常情况下会使用基础正则表达式就已经足够了，但有时为了简化整个指令，需要使用范围更广的扩展正则表达式。例如，使用基础正则表达式查询除文件中空白行与行首为“#”之外的行（通常用于查看生效的配置文件），执行“grep -v‘^$’test.txt | grep -v‘^#’” 即可实现。这里需要使用管道命令来搜索两次。如果使用扩展正则表达式，可以简化为“egrep -v‘^$|^#’test.txt”，其中，单引号内的管道符号表示或者（ or ）。此外，grep 命令仅支持基础正则表达式，如果使用扩展正则表达式，需要使用 egrep 或 awk 命令。 awk 命令在后面的小节进行解这里我们直接使用 egrep 命令。 egrep 命令与 grep 命令的用法基本相似。 egrep 命令是一个搜索文件获得模式，使用该命令可以搜索文件中的任意字符串和符号，也可以搜索一个或多个文件的字符串，一个提示符可以是单个字符、一个字符串、一个字或一个句子。
与基础正则表达式类型相同，扩展正则表达式也包含多个元字符，常见的扩展正则表达式的元字符主要包括以下几个

符号	描述	示例
`+`	匹配前面的元素至少一次（等价于 `{1,}`）。	`a+` 匹配 `a`、`aa`、`aaa` 等；`ab+c` 匹配 `abc`、`abbc` 等。
`?`	匹配前面的元素零次或一次（等价于 `{0,1}`）；也可用于使贪婪匹配变为非贪婪。	`a?` 匹配空字符串或 `a`；`ab?c` 匹配 `ac` 或 `abc`。
`\|`	逻辑或，匹配竖线两侧的任意一个表达式。	`a\|b` 匹配 `a` 或 `b`；`(cat\|dog)` 匹配 `cat` 或 `dog`。
`()`	捕获组，将多个字符视为一个整体，并捕获匹配的内容供后续引用。	`(ab)+` 匹配 `ab`、`abab` 等；`(\d{3})-(\d{4})` 可分别捕获区号和号码。
`( )+`	括号与 `+` 的组合，表示括号内的内容至少出现一次。	`(ab)+` 匹配 `ab`、`abab`、`ababab` 等。

文本处理器

sed 工具

sed （ Stream EDitor ）是一个强大而简单的文本解析转换工具，可以读取文本，并根据
指定的条件对文本内容进行编辑（删除、替换、添加、移动等），最后输出所有行或者仅输
出处理的某些行。 sed 也可以在无交互的情况下实现相当复杂的文本处理操作，被广泛应用
于 Shell 脚本中，用以完成各种自动化处理任务。
sed 的工作流程主要包括读取、执行和显示三个过程

步骤	操作	描述	关键细节
1	读取	从输入流（文件、管道或终端）逐行读取数据到模式空间（Pattern Space）。	- 按行读取，默认以换行符分隔。 - 每读取一行，存入临时缓冲区（模式空间）。
2	执行	对模式空间中的内容应用用户指定的命令（如替换、删除、追加等）。	- 按命令顺序处理（如 `s/old/new/g`）。 - 可基于地址（行号/正则）匹配操作。
3	显示	将处理后的模式空间内容输出到标准输出（默认自动打印）。	- 若未使用 `-n` 选项，处理后的行自动显示。 - 使用 `p` 命令可显式控制输出。

注意：默认情况下所有的 sed 命令都是在模式空间内执行的，因此输入的文件并不会发生任何变化，除非是用重定向存储输出

sed 命令常见用法

通常情况下调用 sed 命令有两种格式，如下所示。其中，“参数”是指操作的目标文件，当存在多个操作对象时用，文件之间用逗号“，”分隔；而 scriptfile 表示脚本文件，需要用“-f” 选项指定，当脚本文件出现在目标文件之前时，表示通过指定的脚本文件来处理输入的目标文件

sed [ 选项 ] ' 操作 ' 参数

sed [ 选项 ] -f scriptfile 参数

常见的 sed 命令选项主要包含以下几种

选项	描述	示例	注意事项
`-n`	抑制默认输出（仅显示通过 `p` 命令显式指定的内容）。	`sed -n '1,3p' file`（仅输出1-3行）	常与 `p`（打印）命令配合使用。
`-e`	指定多个编辑命令，可串联多个 `-e`。	`sed -e 's/a/b/' -e '3d' file`（替换a为b，并删除第3行）	多个命令也可用分号分隔：`sed 's/a/b/;3d' file`。
`-f`	从文件中读取sed脚本命令。	`sed -f script.sed file`（执行 `script.sed` 中的命令）	脚本文件需包含有效的sed命令（如 `s/old/new/` 或 `1,5d`）。
`-i[SUFFIX]`	直接修改文件内容（原地编辑），可选备份原文件（指定后缀）。	`sed -i.bak 's/foo/bar/g' file`（修改文件，并备份为 `file.bak`）	无备份时慎用！Linux可省略后缀（`-i ''`），macOS需明确指定（如 `-i ''`）。
`-r` 或 `-E`	启用扩展正则表达式（支持 `+`, `?`, `\|` 等元字符）。	`sed -r 's/(foo	bar)/test/g' file`（匹配` foo`或`bar`）	GNU sed用 `-r`，BSD/macOS sed用 `-E`。
`-l`	设置输出行的固定长度（自动换行）。	`sed -l 50 's/abc/def/' file`（每行输出最多50字符后换行）	较少使用，需注意系统兼容性。
`--help`	显示帮助信息。	`sed --help`	输出sed版本及选项说明。
`--version`	显示sed版本信息。	`sed --version`	用于检查是否支持特定功能（如 `-i` 的备份行为）。

“ 操作 ” 用于指定对文件操作的动作行为，也就是 sed 的命令。通常情况下是采用的“[n1[,n2]]” 操作参数的格式。 n1 、 n2 是可选的，代表选择进行操作的行数，如操作需要在 5 ～20 行之间进行，则表示为 “5 ， 20 动作行为 ” 。常见的操作包括以下几种

操作	说明
a	增加，在当前行下面增加一行指定内容。
c	替换，将选定行替换为指定内容。
d	删除，删除选定的行。
i	插入，在选定行上面插入一行指定内容。
p	印，如果同时指定行，表示打印指定行；如果不指定行，则表示打印所有内容；如果有非打印字符，则以 ASCII 码输出。其通常与“-n”选项一起使用。
s	替换，替换指定字符。
y	字符转换。

用法示例

test.txt 文件为例进行演示

[root@localhost ~]# cat test.txt
he was short and fat.
he was weating a blue polo shirt with black pants.
The home of Football on BBC Sport online.
the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.
PI=3.14
a wood cross!
Actions speak louder than words

#woood #
#woooooooood #
AxyzxyzxyzxyzC
I bet this place is really spooky late at night!
Misfortunes never come alone/single.
I shouldn't have lett so tast.

[root@localhost ~]# sed -n 'p' test.txt //输出所有内容,等同于

he was short and fat.
he was weating a blue polo shirt with black pants.
The home of Football on BBC Sport online.
the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.
PI=3.14
a wood cross!
Actions speak louder than words

#woood #
#woooooooood #
AxyzxyzxyzxyzC
I bet this place is really spooky late at night!
Misfortunes never come alone/single.
I shouldn't have lett so tast.

[root@localhost ~]# sed -n '1,5{p;n}' test.txt //输出第 1~5 行之间的奇数行(第 1、3、5 行)

he was short and fat.
The home of Football on BBC Sport online.
google is the best tools for search keyword.

[root@localhost ~]# sed -n '10,${n;p}' test.txt //输出第 10 行至文件尾之间的偶数行

#woooooooood #
I bet this place is really spooky late at night!
I shouldn't have lett so tast.

以上是 sed 命令的基本用法，sed 命令结合正则表达式时，格式略有不同，正则表达式以“/”包围。例如

[root@localhost ~]# sed -n '/the/p' test.txt //输出包含 the 的行

the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.

[root@localhost ~]# sed -n '4,/the/p' test.txt //输出从第 4 行至第一个包含 the 的行

the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.
[root@localhost ~]# sed -n '4,/the/p' test.txt
the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.

[root@localhost ~]# sed -n '/the/=' test.txt //输出包含 the 的行所在的行号,等号(=)用来输出行号

4

5

[root@localhost ~]# sed -n '/^PI/p' test.txt //输出以 PI 开头的行

PI=3.14

[root@localhost ~]# sed -n '/[0-9]$/p' test.txt //输出以数字结尾的行

PI=3.14

[root@localhost ~]# sed -n '/\/p' test.txt //输出包含单词 wood 的行,\、\>代表单词边界

删除符合条件的文本（d）

以下示例分别演示了 sed 命令的几种常用删除用法

[root@localhost ~]# nl test.txt | sed '3d' //删除第 3 行

1   he was short and fat.
2   he was weating a blue polo shirt with black pants.
4   the tongue is boneless but it breaks bones.12!
5   google is the best tools for search keyword.
6   PI=3.14
7   a wood cross!
8   Actions speak louder than words

9   #woood #
10   #woooooooood #
11   AxyzxyzxyzxyzC
12   I bet this place is really spooky late at night!
13   Misfortunes never come alone/single.
14   I shouldn't have lett so tast.

[root@localhost ~]# nl test.txt | sed '3,5d' //删除第 3~5 行

1   he was short and fat.
2   he was weating a blue polo shirt with black pants.
4   the tongue is boneless but it breaks bones.12!
5   google is the best tools for search keyword.
6   PI=3.14
7   a wood cross!
8   Actions speak louder than words

9   #woood #
10   #woooooooood #
11   AxyzxyzxyzxyzC
12   I bet this place is really spooky late at night!
13   Misfortunes never come alone/single.
14   I shouldn't have lett so tast.

[root@localhost ~]# nl test.txt |sed '/cross/d'//删除包含 cross 的行,原本的第 8 行被删除；如果要删除不包含 cross 的行,用!符号表示取反操作, 如'/cross/！d'

[root@localhost ~]# nl test.txt |sed '/cross/d'
1   he was short and fat.
2   he was weating a blue polo shirt with black pants.
3   The home of Football on BBC Sport online.
4   the tongue is boneless but it breaks bones.12!
5   google is the best tools for search keyword.
6   PI=3.14
8   Actions speak louder than words

9   #woood #
10   #woooooooood #
11   AxyzxyzxyzxyzC
12   I bet this place is really spooky late at night!
13   Misfortunes never come alone/single.
14   I shouldn't have lett so tast.

root@localhost ~]# sed '/^[a-z]/d' test.txt //删除以小写字母开头的行

The home of Football on BBC Sport online.
PI=3.14
Actions speak louder than words

#woood #
#woooooooood #
AxyzxyzxyzxyzC
I bet this place is really spooky late at night!
Misfortunes never come alone/single.
I shouldn't have lett so tast.

[root@localhost ~]# sed '/\.$/d' test.txt //删除以"."结尾的行

the tongue is boneless but it breaks bones.12!
PI=3.14
a wood cross!
Actions speak louder than words

#woood #
#woooooooood #
AxyzxyzxyzxyzC
I bet this place is really spooky late at night!

[root@localhost ~]# sed '/^$/d' test.txt //删除所有空行

he was short and fat.
he was weating a blue polo shirt with black pants.
The home of Football on BBC Sport online.
the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.
PI=3.14
a wood cross!
Actions speak louder than words
#woood #
#woooooooood #
AxyzxyzxyzxyzC
I bet this place is really spooky late at night!
Misfortunes never come alone/single.
I shouldn't have lett so tast.

替换符合条件的文本

在使用 sed 命令进行替换操作时需要用到 s（字符串替换）、c（整行/整块替换）、y （字符转换）命令选项，常见的用法如下所示

sed 's/the/THE/' test.txt //将每行中的第一个 the 替换为 THE

sed 's/l/L/2' test.txt //将每行中的第 2 个 l 替换为 L

sed 's/the/THE/g' test.txt //将文件中的所有 the 替换为 THE第 17 页共 27 页

sed 's/o//g' test.txt //将文件中的所有 o 删除(替换为空串)

sed 's/^/#/' test.txt //在每行行首插入#号

sed '/the/s/^/#/' test.txt //在包含 the 的每行行首插入#号

sed 's/$/EOF/' test.txt //在每行行尾插入字符串 EOF

sed '3,5s/the/THE/g' test.txt //将第 3~5 行中的所有 the 替换为 THE

sed '/the/s/o/O/g' test.txt //将包含 the 的所有行中的 o 都替换为 O

迁移符合条件的文本

在使用 sed 命令迁移符合条件的文本时，常用到以下参数.

H:复制到剪贴板；
g、G:将剪贴板中的数据覆盖/追加至指定行；
w：保存为文件；
r：读取指定文件；
a：追加指定内容。

sed '/the/{H;d};$G' test.txt //将包含 the 的行迁移至文件末尾,{;}用于多个操作

sed '1,5{H;d};17G' test.txt //将第 1~5 行内容转移至第 17 行后

sed '/the/w out.file' test.txt //将包含 the 的行另存为文件 out.file

sed '/the/r /etc/hostname' test.txt //将文件/etc/hostname 的内容添加到包含 the 的每行以后

sed '3aNew' test.txt //在第 3 行后插入一个新行,内容为 New

sed '/the/aNew' test.txt //在包含 the 的每行后插入一个新行,内容为 New

sed '3aNew1\nNew2' test.txt //在第 3 行后插入多行内容,中间的\n 表示换行

使用脚本编辑文件

使用 sed 脚本将多个编辑指令存放到文件中（每行一条编辑指令），通过“-f”选项来调用。例如执行以下命令即可将第 1~5 行内容转移至第 17 行后

sed '1,5{H;d};17G' test.txt //将第 1~5 行内容转移至第 17 行后

sed 直接操作文件示例

编写一个脚本，用来调整 vsftpd 服务配置，要求禁止匿名用户，但允许本地用户（也允许写入）

[root@localhost ~]# vim local_only_ftp.sh

#!/bin/bash

# 指定样本文件路径、配置文件路径

SAMPLE="/usr/share/doc/vsftpd-3.0.2/EXAMPLE/INTERNET_SITE/vsftpd.conf " CONFIG="/etc/vsftpd/vsftpd.conf"

# 备份原来的配置文件,检测文件名为/etc/vsftpd/vsftpd.conf.bak 备份文件是否存在, 若不存在则使用 cp 命令进行文件备份

[ ! -e "$CONFIG.bak" ] && cp $CONFIG $CONFIG.bak

# 基于样本配置进行调整,覆盖现有文件

sed -e '/^anonymous_enable/s/YES/NO/g' $SAMPLE > $CONFIG

sed -i -e '/^local_enable/s/NO/YES/g' -e '/^write_enable/s/NO/YES/g' $CONFIG

grep "listen" $CONFIG || sed -i '$alisten=YES' $CONFIG

# 启动 vsftpd 服务,并设为开机后自动运行

systemctl restart vsftpd

systemctl enable vsftpd

awk 工具

awk 常见用法

通常情况下 awk 所使用的命令格式如下所示，其中，单引号加上大括号“{}”用于设置对数据进行的处理动作。awk 可以直接处理目标文件，也可以通过“-f”读取脚本对目标文件进行处理。

awk 选项 '模式或条件 {编辑指令}' 文件 1 文件 2 … //过滤并输出文件中符合条件的内容

awk -f 脚本文件文件 1 文件 2 … //从脚本中调用编辑指令,过滤并输出内容

前面提到 sed 命令常用于一整行的处理,而 awk 比较倾向于将一行分成多个“字段”然后再进行处理，且默认情况下字段的分隔符为空格或 tab 键。awk 执行结果可以通过 print 的功能将字段数据打印显示。在使用 awk 命令的过程中,可以使用逻辑操作符“&&”表示“与”、“||” 表示“或”、“！”表示“非”；还可以进行简单的数学运算，如+、-、*、/、%、^分别表示加、减、乘、除、取余和乘方。在 Linux 系统中/etc/passwd 是一个非常典型的格式化文件，各字段间使用“：”作为分隔符隔开，Linux 系统中的大部分日志文件也是格式化文件，从这些文件中提取相关信息是运维的日常工作内容之一。若需要查找出/etc/passwd 的用户名、用户 ID、组 ID 等列，执行以下 awk 命令即可

[root@localhost ~]# awk -F ':' '{print $1,$3,$4}' /etc/passwd

awk 包含几个特殊的内建变量（可直接用）如下所示：

变量	说明
FS	指定每行文本的字段分隔符，默认为空格或制表位。
NF	当前处理的行的字段个数。
NR	当前处理的行的行号（序数）。
$0	当前处理的行的整行内容。
$n	当前处理行的第 n 个字段（第 n 列）。
FILENAME	被处理的文件名。
RS	数据记录分隔，默认为\n，即每行为一条记录。

用法示例

按行输出文本

awk '{print}' test.txt //输出所有内容,等同于 cat test.txt

awk '{print $0}' test.txt //输出所有内容,等同于 cat test.txt

awk 'NR==1,NR==3{print}' test.txt //输出第 1~3 行内容

awk '(NR>=1)&&(NR//输出第 1~3 行内容

awk 'NR==1||NR==3{print}' test.txt //输出第 1 行、第 3 行内容

awk '(NR%2)==1{print}' test.txt //输出所有奇数行的内容

awk '(NR%2)==0{print}' test.txt //输出所有偶数行的内容

awk '/^root/{print}' /etc/passwd //输出以 root 开头的行

awk '/nologin$/{print}' /etc/passwd //输出以 nologin 结尾的行

awk 'BEGIN {x=0};/\/bin\/bash$/{x++};END {print x}' /etc/passwd //统计以/bin/bash 结尾的行数,等同于 grep -c "/bin/bash$" /etc/passwd

awk 'BEGIN{RS=""};END{print NR}' /etc/squid/squid.conf //统计以空行分隔的文本段落数

按字段输出文本

awk '{print $3}' test.txt //输出每行中(以空格或制表位分隔)的第 3 个字段

awk '{print $1,$3}' test.txt //输出每行中的第 1、3 个字段

awk -F ":" '$2==""{print}' /etc/shadow //输出密码为空的用户的 shadow 记录

awk 'BEGIN {FS=":"}; $2==""{print}' /etc/shadow //输出密码为空的用户的 shadow 记录

awk -F ":" '$7~"/bash"{print $1}' /etc/passwd //输出以冒号分隔且第 7 个字段中包含/bash 的行的第 1 个字段

awk '($1~"nfs")&&(NF==8){print $1,$2}' /etc/services //输出包含 8 个字段且第 1 个字段中包含 nfs 的行的第 1、2 个字段

awk -F ":" '($7!="/bin/bash")&&($7!="/sbin/nologin"){print}' /etc/passwd //输出第 7 个字段既不为/bin/bash 也不为/sbin/nologin 的所有行

通过管道、双引号调用 Shell 命令

awk -F: '/bash$/{print | "wc -l"}' /etc/passwd //调用 wc -l 命令统计使用 bash 的用户个数,等同于 grep -c "bash$" /etc/passwd

awk 'BEGIN {while ("w" | getline) n++ ; {print n-2}}' //调用 w 命令,并用来统计在线用户数

awk 'BEGIN { "hostname" | getline ; print $0}' //调用 hostname,并输出当前的主机名