Linux Shell 脚本编程（9）—文本过滤（sed命令）

最新推荐文章于 2023-07-22 16:17:47 发布

晴空❄雨霁

最新推荐文章于 2023-07-22 16:17:47 发布

阅读量1.3k

点赞数

分类专栏： LINUX SHELL 脚本编程文章标签： linux shell sed

本文链接：https://blog.csdn.net/lingfengliujian/article/details/78278744

版权

LINUX SHELL 脚本编程专栏收录该内容

10 篇文章 3 订阅

订阅专栏

Linux Shell 脚本编程（9）—文本过滤（sed命令）

文本过滤

正则表达式 —Linux Shell 脚本编程（5）—文本过滤（正则表达式）
grep 命令 —Linux Shell 脚本编程（6）—文本过滤（grep命令）
find命令 —Linux Shell 脚本编程（7）—文本过滤（find命令）
awk命令 —Linux Shell 脚本编程（8）—文本过滤（awk命令）
sed命令 —Linux Shell 脚本编程（9）—文本过滤（sed命令）
合并与分割（sort、uniq、join、cut、paste、split）

sed命令

sed是一款流编辑工具，用来对文本进行过滤与替换工作，特别是当你想要对几十个配置文件做统计修改时，你会感受到sed的魅力！
sed通过输入读取文件内容，但一次仅读取一行内容进行某些指令处理后输出，所以sed更适合于处理大数据文件。

1. sed原理

通过文件或管道读取文件内容。
sed并不直接修改源文件，而是将读入的内容复制到缓冲区中，我们称之为模式空间（pattern space）。
根据sed的指令对模式空间中的内容进行处理并输出结果，默认输出至标准输出即屏幕上。

2. sed 基本结构

sed [选项] [脚本指令] [输入文件]
如果没有输入文件，则sed默认对标准输入进行处理（即键盘输入）。脚本指令是第一个不以“-”开始的参数。

变量名称	意义
-n	静默输出，默认情况下，sed程序在所有的脚本指令执行完毕后，将自动打印模式空间中的内容，该选项可以屏蔽自动打印。
-e	允许多个脚本指令被执行。
-f	从文件中读取脚本指令，实现自动脚本程序
-i	直接修改源文件，经过脚本指令处理后的内容将被输出至源文件（源文件被修改！），慎用！！
-l N	指定l指令可以输出的行长度，l指令用于输出非打印字符
-r	在脚本指令中使用扩展的正则表达式
-s	默认情况下，sed将把命令行指定的多个文件名作为一个长的连续的输入流。GNU sed则允许把他们当作单独的文件，这样如正则表达式则不进行跨文件匹配。
-u	最低限度的缓存输入与输出

3. 使用sed进行文本替换

3.1 sed替换给定文本中的字符串

sed ‘s/pattern/replace_str/’ file
cat file | sed ‘s/pattern/replace_str/’

jianliu@ubuntu:~/aa$ cat test.txt
my neigbors are bob hanlun and jack

jianliu@ubuntu:~/aa$ sed 's/bob/kang/' test.txt
my neigbors are kang hanlun and jack

jianliu@ubuntu:~/aa$ cat test.txt
my neigbors are bob hanlun and jack

jianliu@ubuntu:~/aa$ cat test.txt | sed 's/bob/kang/'
my neigbors are kang hanlun and jack

3.2 -i 在替换的同时保存更改，可以将替换结果应用于原文件(修改源文件）。

—- 在默认情况下，sed只会打印替换后的文本。

jianliu@ubuntu:~/aa$ cat test.txt
my neigbors are bob hanlun and jack

jianliu@ubuntu:~/aa$ sed -i 's/bob/kang/' test.txt

jianliu@ubuntu:~/aa$ cat test.txt
my neigbors are kang hanlun and jack

在进行替换之后，可借助重定向来保存文件

jianliu@ubuntu:~/aa$ cat test.txt
my neigbors are kang hanlun and jack

jianliu@ubuntu:~/aa$ sed 's/kang/bob/' test.txt > newtest.txt
jianliu@ubuntu:~/aa$ mv newtest.txt test.txt

jianliu@ubuntu:~/aa$ cat test.txt
my neigbors are bob hanlun and jack

3.3 在命令尾部加上g,替换所有出现的位置内容。

—–默认情况下，sed命令会将每一行中第一处符合模式的内容替换掉。
- sed ‘s/pattern/replace_str/g’ file

后缀/g意味着sed会替换每一处匹配。
但是有时候我们只需要从第n处匹配开始替换。对此，可以使用/Ng选项

jianliu@ubuntu:~/aa$ echo thisthisthisthis | sed 's/this/THIS/2g'
thisTHISTHISTHIS

jianliu@ubuntu:~/aa$ echo thisthisthisthis | sed 's/this/THIS/3g'
thisthisTHISTHIS

jianliu@ubuntu:~/aa$ echo thisthisthisthis | sed 's/this/THIS/4g'
thisthisthisTHIS

jianliu@ubuntu:~/aa$ echo thisthisthisthis | sed 's/this/THIS/g'
THISTHISTHISTHIS

字符/在sed中被作为定界符使用
sed ‘s:text:replace:g’
sed ‘s|text|replace|g’

jianliu@ubuntu:~/aa$ echo thisthisthisthis | sed 's|this|THIS|g'
THISTHISTHISTHIS

jianliu@ubuntu:~/aa$ echo thisthisthisthis | sed 's:this:THIS:g'
THISTHISTHISTHIS

当定界符出现在样式内部时，我们必须用前缀\对它进行转义
sed ‘s|te|xt|replace|g’
\|是一个出现在样式内部并经过转义的定界符。

jianliu@ubuntu:~/aa$ echo th\|isthisth\|isth\|is | sed 's|th\|is|THIS|g'
THISthisTHISTHIS

3.4 移除空白行

sed ‘/^$/d/ file
—-/pattern/d会移除匹配样式的行
在空白行中，行尾标记紧随着行首标记。可以用^$进行匹配

jianliu@ubuntu:~/aa$ cat test0.txt
word1  1

aword2 2
word3 3


1word4 4
word@ 5
wor3 6
wo3 7
word12 8
abcde 9
wore21 10
12345 11

jianliu@ubuntu:~/aa$ sed '/^$/d' test0.txt
word1  1
aword2 2
word3 3
1word4 4
word@ 5
wor3 6
wo3 7
word12 8
abcde 9
wore21 10
12345 11

3.5 直接在文件中进行替换

如果将文件名传递给sed，它会将文件内容输出到stdout。如果我们想修改文件内容，可以使用-i选项：
格式： sed ‘s/pattern/replace_str/’ -i filename

#使用制定的数字替换文件中所有三位数的数字
jianliu@ubuntu:~/aa$ cat test.txt
11 abc 111 this 9 file contains 111 11 88 numbers 0000

jianliu@ubuntu:~/aa$ sed -i 's/\b[0-9]\{3\}\b/NUMBER/g' test.txt

jianliu@ubuntu:~/aa$ cat test.txt
11 abc NUMBER this 9 file contains NUMBER 11 88 numbers 0000


#正则表达式\b[0-9]\{3\}\b用于匹配3位数字。 [0-9]
#表示数位取值范围，也就是说从0~9。 {3}表示匹配之前的字符3次。 \{3\}中的\用于转义{和}。

#\b表示单词边界。

jianliu@ubuntu:~/aa$ sed -i 's/[0-9]\{3\}/NUMBER/g' test.txt
jianliu@ubuntu:~/aa$ cat test.txt
11 abc NUMBER this 9 file contains NUMBER 11 88 numbers NUMBER0

.bak 创建原始文件的副本。
sed -i .bak ‘s/pattern/replace_str/’ file
—这时的sed不仅执行文件内容替换，还会创建一个名为file.bak的文件，其中包含着原始文件内容的副本。

3.6 已匹配字符串标记(&)

在sed中，可以用&标记匹配样式的字符串，这样就能在替换字符串是使用已匹配的内存。

jianliu@ubuntu:~/aa$ echo this is an example | sed 's/\w\+/[&]/g'
[this] [is] [an] [example]

#正则表达式 \w\+ 匹配每一个单词，然后我们用[&]替换它。 & 对应于之前所匹配到的单词。

3.7 子串匹配标记(\数字)

& 代表匹配给定样式的字符串。但我们也可以匹配给定样式的其中一部分;
$pattern$用于匹配子串。模式被包括在使用斜线转义过的()中。对于匹配到的第一个子串，其对应的标记是 \1，匹配到的第二个子串是 \2，往后依次类推。

#将digit 7 替换为 7
jianliu@ubuntu:~/aa$ echo this is digit 7 in a number | sed 's/digit \([0-9]\)/\1/'
this is 7 in a number

多个子串匹配的情况

#逆序字符串
jianliu@ubuntu:~/aa$ echo seven EIGHT | sed 's/\([a-z]\+\) \([A-Z]\+\)/\2 \1/'
EIGHT seven

#([a-z]\+\)匹配第一个单词， ([A-Z]\+\)匹配第二个单词。 \1和\2用来引用它们。这种引用
#被称为向后引用（back reference）。在替换部分，它们的次序被更改为\2 \1，因此结果就呈现出逆序的形式

jianliu@ubuntu:~/aa$ echo seven EIGHT | sed 's/\([a-z]\) \([A-Z]\)/\2 \1/'
seveE nIGHT

3.8 -e 或管道：组合多个表达式

利用管道组合多个sed命令
sed ‘表达式’ | sed ‘表达式’
sed ‘表达式1；表达式2’
-e 选项
sed -e ‘表达式’ -e ‘表达式’

jianliu@ubuntu:~/aa$ echo abc | sed 's/a/A/' | sed 's/c/C/'
AbC

jianliu@ubuntu:~/aa$ echo abc | sed 's/a/A/; s/c/C/'
AbC

jianliu@ubuntu:~/aa$ echo abc | sed -e 's/a/A/' -e 's/c/C/'
AbC

3.9 引用

sed表达式通常用单引号来引用。不过也可以使用双引号。
双引号会通过对表达式求值来对其进行扩展。

在sed表达式中使用一些变量时，双引号就能派上用场了。

jianliu@ubuntu:~/aa$ text=hello
jianliu@ubuntu:~/aa$ echo hello world | sed "s/$text/HELLO/"
HELLO world

晴空❄雨霁

关注

0
点赞
踩
10

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录