Linux总结（十五）：linux文本处理工具——高级sed

最新推荐文章于 2023-12-17 16:28:51 发布

科大小笨

最新推荐文章于 2023-12-17 16:28:51 发布

阅读量226

点赞数

分类专栏： Linux操作系统

本文链接：https://blog.csdn.net/TuZiFaDai/article/details/94843885

版权

Linux操作系统专栏收录该内容

25 篇文章 9 订阅

订阅专栏

一、sed 多行命令

sed基本命令格式：sed [选项] ‘正则或者数字 {命令脚本; 命令脚本}’ 文件

有时我们需要对跨多行的数据执行特定操作，如果用普通的 sed 编辑器命令来处理文本，就不可能发现这种被分开的情况。sed 包含了三个可用来处理多行文本的特殊命令：D,N,P

（1）Next 命令（N）：将数据流中的下一行加进来创建一个多行组来处理。

（2）Delete（D）：删除多行组中的一行。

（3）Print（P）：打印多行组中的一行。

二、sed N 将两行数据当成一行来处理

N 命令会将下一行文本内容添加到缓冲区已有数据之后（之间用换行符分隔），从而使前后两个文本行同时位于缓冲区中，sed 命令会将这两行数据当成一行来处理。

1、基本用法

"1、文本文件中的两行在 sed 的输出中成了一行。"
[root@localhost ~]cat data2.txt
#This is the header line.
#This is the first data line.
#This is the second data line.
#This is the last line.
[root@localhost ~]sed '/first/{ N ; s/\n/ / }' data2.txt
#This is the header line.
#This is the first data line. This is the second data line.
#This is the last line.
"2、从字符串中删掉了换行符，导致两行合并成一行"
[root@localhost ~]cat data3.txt
#On Tuesday, the Linux System
#Administrator's group meeting will be held.
#All System Administrators should attend.
#Thank you for your attendance.
[root@localhost ~]sed 'N ; s/System Administrator/Desktop User/' data3.txt
#On Tuesday, the Linux Desktop User's group meeting will be held.
#All Desktop Users should attend.
#Thank you for your attendance.
"3、替换后，换行不变"
[root@localhost ~]sed 'N
> s/System\nAdministrator/Desktop\nUser/
> s/System Administrator/Desktop User/
> ' data3.txt
#On Tuesday, the Linux Desktop
#User's group meeting will be held.
#All Desktop Users should attend.
#Thank you for your attendance.

2、解决代码“3”中的问题

第一个替换命令专门查找两个单词间的换行符，并将它放在了替换字符串中。这样就能在第一个替换命令专门在两个检索词之间寻找换行符，并将其纳入替换字符串。这样就允许在新文本的同样位置添加换行符了。但这个脚本中仍有个小问题，即它总是在执行 sed 命令前将下一行文本读入到缓冲区中，当它到了后一行文本时，就没有下一行可读了，此时 N 命令会叫 sed 程序停止，这就导致，如果要匹配的文本正好在最后一行中，sed 命令将不会发现要匹配的数据。

[root@localhost ~]sed '
> s/System\nAdministrator/Desktop\nUser/
> N
> s/System Administrator/Desktop User/
> ' data3.txt
#On Tuesday, the Linux Desktop
#User's group meeting will be held.
#All Desktop Users should attend.
#Thank you for your attendance.

三、D 多行删除命令

D 命令将缓冲区中第一个换行符（包括换行符）之前的内容删除掉。

1、基本实例

"1、删除第一行"
[root@localhost ~]cat data4.txt
#On Tuesday, the Linux System
#Administrator's group meeting will be held.
#All System Administrators should attend.
[root@localhost ~]sed 'N ; /System\nAdministrator/D' data4.txt
#Administrator's group meeting will be held.
#All System Administrators should attend.
"2、删除空白行"
[root@localhost ~]cat data5.txt

#This is the header line.
#This is a data line.

#This is the last line.
[root@localhost ~]sed '/^$/{N ; /header/D}' data5.txt
#This is the header line.
#This is a data line.

#This is the last line.

文本的第二行被 N 命令加到了缓冲区，因此 sed 命令第一次匹配就是成功，而 D 命令会将缓冲区中第一个换行符之前（也就是第一行）的数据删除，所以，得到了“代码1”结果。

sed会查找空白行，然后用 N 命令来将下一文本行添加到缓冲区。此时如果缓冲区的内容中含有单词 header，则 D 命令会删除缓冲区中的第一行。

四、P 多行打印命令

同 d 和 D 之间的区别一样，P（大写）命令和单行打印命令 p（小写）不同，对于具有多行数据的缓冲区来说，它只会打印缓冲区中的第一行，也就是首个换行符之前的所有内容。

1、输出对比

表 1 P 命令和 p 命令的对比
P（大写）命令	p（小写）命令
[root@localhost ~]# sed '/.*/N;P' aaa aaa bbb ccc ccc ddd eee eee fff	[root@localhost ~]# sed '/.*/N;p' aaa bbb aaa bbb ccc ddd ccc ddd eee fff eee fff

第一个 sed 命令，每次都使用 N 将下一行内容追加到缓冲区内容的后面（用换行符间隔），也就是说，第一次时缓冲区中的内容为 aaa\nbbb，但 P（大写）命令的作用的打印换行符之前的内容，也就是 aaa，之后则是 sed 在自动输出功能输出 aaa 和 bbb（sed 命令会自动将 \n 输出为换行），依次类推，就输出了所看到的结果。第二个 sed 命令，使用的是 p （小写）单行打印命令，它会将缓冲区中的所有内容全部打印出来（\n 会自动输出为换行），因此，出现了看到的结果。

五、sed 保持空间

前面我们一直说，sed 命令处理的是缓冲区中的内容，其实这里的缓冲区，应称为模式空间。值得一提的是，模式空间并不是 sed 命令保存文件的唯一空间。sed 还有另一块称为保持空间的缓冲区域，它可以用来临时存储一些数据。

1、基本命令

sed 保持空间命令
命令	功能
h	将模式空间中的内容复制到保持空间
H	将模式空间中的内容附加到保持空间
g	将保持空间中的内容复制到模式空间
G	将保持空间中的内容附加到模式空间
x	交换模式空间和保持空间中的内容

通常，在使用 h 或 H 命令将字符串移动到保持空间后，最终还要用 g、G 或 x 命令将保存的字符串移回模式空间。保持空间最直接的作用是，一旦我们将模式空间中所有的文件复制到保持空间中，就可以清空模式空间来加载其他要处理的文本内容。

2、基本应用

[root@localhost ~]cat data2.txt
#This is the header line.
#This is the first data line.
#This is the second data line.
#This is the last line.
[root@localhost ~]sed -n '/first/ {h ; p ; n ; p ; g ; p }' data2.txt
#This is the first data line.
#This is the second data line.
#This is the first data line.

执行流程：

（1）sed脚本命令用正则表达式过滤出含有单词first的行；

（2）当含有单词 first 的行出现时，h 命令将该行放到保持空间；

（3）p 命令打印模式空间也就是第一个数据行的内容；

（4）n 命令提取数据流中的下一行（This is the second data line），并将它放到模式空间；

（5）p 命令打印模式空间的内容，现在是第二个数据行；

（6）g 命令将保持空间的内容（This is the first data line）放回模式空间，替换当前文本；

（7）p 命令打印模式空间的当前内容，现在变回第一个数据行了。

六、b 分支命令

通常，sed 程序的执行过程会从第一个脚本命令开始，一直执行到最后一个脚本命令（D 命令是个例外，它会强制 sed 返回到脚本的顶部，而不读取新的行）。sed 提供了 b 分支命令来改变命令脚本的执行流程，其结果与结构化编程类似。

1、基本格式

[address]b [label]

address 参数决定了哪些行的数据会触发分支命令，label 参数定义了要跳转到的位置。

2、基本实例

"1、跳到脚本末尾，即不执行s/This is/Is this/ ; s/line./test?/"
[root@localhost ~]cat data2.txt
#This is the header line.
#This is the first data line.
#This is the second data line.
#This is the last line.
[root@localhost ~]sed '{2,3b ; s/This is/Is this/ ; s/line./test?/}' data2.txt
#Is this the header test?
#This is the first data line.
#This is the second data line.
#Is this the last test?
"2、跳过脚本，不执行s/This is the/No jump on/"
[root@localhost ~]sed '{/first/b jump1 ; s/This is the/No jump on/
> :jump1
> s/This is the/Jump here on/}' data2.txt
#No jump on header line
#Jump here on first data line
#No jump on second data line
#No jump on last line
"3、b 分支命令除了可以向后跳转，还可以向前跳转"
[root@localhost ~]echo "This, is, a, test, to, remove, commas." | sed -n '{
> :start
> s/,//1p
> /,/b start
> }'
#This is, a, test, to, remove, commas.
#This is a, test, to, remove, commas.
#This is a test, to, remove, commas.
#This is a test to, remove, commas.
#This is a test to remove, commas.
#This is a test to remove commas.

脚本1解释：需要注意的是，如果没有加 label 参数，跳转命令会跳转到脚本的结尾。

脚本2解释：如果我们不想直接跳到脚本的结尾，可以为 b 命令指定一个标签（也就是格式中的 label，最多为 7 个字符长度）。在使用此该标签时，要以冒号开始（比如 :label2），并将其放到要跳过的脚本命令之后。这样，当 sed 命令匹配并处理该行文本时，会跳过标签之前所有的脚本命令，但会执行标签之后的脚本命令。

脚本3解释：当缓冲区中的行内容中有逗号时，脚本命令就会一直循环执行，每次迭代都会删除文本中的第一个逗号，并打印字符串，直至内容中没有逗号。

七、t 测试命令

类似于 b 分支命令，t 命令也可以用来改变 sed 脚本的执行流程。t 测试命令会根据 s 替换命令的结果，如果匹配并替换成功，则脚本的执行会跳转到指定的标签；反之，t 命令无效。

1、基本格式

[address]t [label]

2、基本使用

"1、s执行成功，运行t，跳转到脚本最后"
[root@localhost ~]sed '{
> s/first/matched/
> t
> s/This is the/No match on/
> }' data2.txt
#No match on header line
#This is the matched data line
#No match on second data line
#No match on last line
"2、如果匹配并替换成功，命令会直接跳过后面的替换命令"
[root@localhost ~]echo "This, is, a, test, to, remove, commas. " | sed -n '{
> :start
> s/,//1p
> t start
> }'
#This is, a, test, to, remove, commas.
#This is a, test, to, remove, commas.
#This is a test, to, remove, commas.
#This is a test to, remove, commas.
#This is a test to remove, commas.
#This is a test to remove commas.

脚本“1”解释：跟分支命令一样，在没有指定标签的情况下，如果 s 命令替换成功，sed 会跳转到脚本的结尾（相当于不执行任何脚本命令）

脚本“2”解释：，第一个替换命令会查找模式文本 first，如果匹配并替换成功，命令会直接跳过后面的替换命令；反之，如果第一个替换命令未能匹配成功，第二个替换命令就会被执行。
区别：b的跳转起点是正则表达式匹配成功，t的跳转起点是S执行成功。