SECTION 22 初识sed和gawk

最新推荐文章于 2024-07-14 18:35:17 发布

壹只菜鸟

最新推荐文章于 2024-07-14 18:35:17 发布

阅读量119

点赞数

分类专栏： # bash shell整理文章标签：编辑器 linux 运维

本文链接：https://blog.csdn.net/u010230019/article/details/128190672

版权

bash shell整理专栏收录该内容

93 篇文章 1 订阅

订阅专栏

初识sed和gawk

linux提供了两个常见的交互式文本编辑器：sed和gawk，用于自动格式化、插入或删除文本元素

sed编辑器

sed编辑器称为流编辑器(stream editor)，会在编辑器处理数据之前基于预先提供的一组规则来编辑数据流
sed编辑器根据命令来处理数据流中的数据，这些命令要么从命令行输入，要么存储在命令文本文件中
sed编辑器会执行下列操作：

1.一次从输入中读取一行数据
2.根据所提供的编辑器命令匹配数据
3.按照命令修改流中的数据
4.将新的数据输出到STDOUT

在流编辑器将所有命令与一行数据匹配完毕后，会读取下一行数据并重复这个过程，在流编辑器处理完流中所有数据行后终止
sed命令格式：

sed options script file

options:
-e script 在处理输入时，将script中指定的命令添加到已有命令中
-f file 在处理输入时，将file中指定的命令添加到已有的命令中
-n 不产生命令输出，使用print命令来完成输出

在命令行定义编辑器命令
默认情况下，sed会将指定命令应用到STDIN输入流上，这样可以直接将数据通过管道输入sed编辑器处理，例如

echo "this is a test"|sed 's/test/big test/'
this is a big test

sed中的s会用斜线的第二个字符串来替换第一个字符串，例如

]# cat data1.txt 
1the	quick brown fox jumps over the lazy dog
2the quick brown fox jumps over the lazy dog
3the quick brown fox jumps over the lazy dog
4the quick brown fox jumps over the lazy dog
5the quick brown fox jumps over the lazy dog
]# sed 's/dog/cat/' data1.txt 
1the	quick brown fox jumps over the lazy cat
2the quick brown fox jumps over the lazy cat
3the quick brown fox jumps over the lazy cat
4the quick brown fox jumps over the lazy cat
5the quick brown fox jumps over the lazy cat

在命令行使用多个编辑命令

]# sed -e 's/dog/cat/; s/brown/green/' data1.txt 
1the	quick green fox jumps over the lazy cat
2the quick green fox jumps over the lazy cat
3the quick green fox jumps over the lazy cat
4the quick green fox jumps over the lazy cat
5the quick green fox jumps over the lazy cat

要执行多个命令，需要使用选项-e，命令之间必须用分号隔开，并且在命令末尾和分号之间不能有空格（有空格也没发现问题）
如果不想使用分号隔开命令，也可以用bash shell中的次提示符来分隔命令，在第一个引号后回车，即可多行输入命令，例如

]# sed -e '
> s/dog/cat/
> s/brown/red/
> s/fox/elephant/' data1.txt
1the	quick red elephant jumps over the lazy cat
2the quick red elephant jumps over the lazy cat
3the quick red elephant jumps over the lazy cat
4the quick red elephant jumps over the lazy cat
5the quick red elephant jumps over the lazy cat

从文件中读取编辑器命令
如果需要处理大量的sed命令，把他们放入单独的文件更方便，此时需要通过-f选项读取sed文件

]# cat script1.sed 
s/brown/green/
s/fox/elephant/
s/dog/cat/
]# sed -f script1.sed data1.txt 
1the	quick green elephant jumps over the lazy cat
2the quick green elephant jumps over the lazy cat
3the quick green elephant jumps over the lazy cat
4the quick green elephant jumps over the lazy cat
5the quick green elephant jumps over the lazy cat

建议sed编辑器文件以sed结尾，便于区分文件

gawk程序

gawk程序是处理文件中的数据的更高级工具，它能提供一个类编程环境来修改和重新组织文件中的数据
在gawk编程语言中，可以做到：

1.定义变量保存数据
2.使用算术和字符串操作符来处理数据
3.使用结构化编程概念（比如if-then和循环）来为数据处理增加处理逻辑
4.通过提取数据文件中的数据元素，将其重新排列或格式化，生成格式化报告

gawk命令格式

gawk options program file

options:

-F fs 指定行中划分数据字段的字段分隔符
-f file 从指定的文件中读取程序
-v var=value 定义gawk程序中的一个变量及其默认值
-mf N 指定数据文件中的最大数据行数
-W keyword 指定gawk的兼容模式或警告等级

从命令行读取程序脚本
gawk程序脚本用一对花括号来定义，例如

]# gawk '{print "hello world"}'
this is a test
hello world

gawk会对数据流的每行文本执行脚本，上例中只是读取文本，并打印设定的结果，可以通过Ctrl+c或Ctrl+D（生成EOF end-of-file）组合键终止程序

使用数据字段变量
gawk主要特性之一是其处理文本文件中数据的能力，它会自动给一行中的每个数据元素分配一个变量，默认情况下，gawk会分配如下变量：

$0 代表整个文本行
$1 代表文本行中的第1个数据字段
$n 代表文本行中的第n个数据字段

每个数据字段都是通过字段分隔符划分的，gawk读取一行文本时，会用预定义的字段分隔符划分每个数据字段，默认分隔符是空白字符（空格或值表符），例如

]# cat data2.txt 
1the quick brown elephntx jumps over the lazy dog
2the quick brown fox jumps over the lazy dog
3the quick brown fox jumps over the lazy dog
4the quick brown fox jumps over the lazy dog
5the quick brown fox jumps over the lazy dog
]# gawk '{print $1}' data2.txt 
1the
2the
3the
4the
5the

通过-F选项指定字段分隔符

]# gawk -F: '{print $1}' /etc/passwd
root
bin
daemon
adm

在程序脚本中使用多个命令

]# echo "my name is niuniu"|gawk '{$4="NIUNIU";print $0}'
my name is NIUNIU

同样，可以可以使用次分隔符来输入多个命令

]# echo "my name is niuniu"|gawk '{
> $4="NIUNIU"
> print $0}'
my name is NIUNIU

从文件中读取程序

]# cat script2.gawk 
{
text="'s home directory is"
print $1 text  $6
}
]# gawk -F: -f script2.gawk /etc/passwd
root's home directory is/root
bin's home directory is/bin
daemon's home directory is/sbin
adm's home directory is/var/adm

在处理数据前运行脚本
gawk还允许指定程序脚本何时运行，默认情况下，gawk会从输入中读取一行文本，然后针对该行数据执行程序脚本
但有时可能需要在处理数据前运行脚本，比如为报告创建标题，BEGIN关键字就是用来做这个，它会强制gawk在读取数据前执行BEGIN关键字后指定的程序脚本，例如

]# gawk 'BEGIN{print "hela"}'
hela

这次打印完hela直接退出程序，因为没有需要处理的数据

]# gawk 'BEGIN {print "BEGIN:"}
{print $1}' data1.txt 

BEGIN:
1the
2the
3the
4the
5the

在处理数据后运行脚本
类似BEGIN,END关键字允许你指定一个程序脚本，gawk处理完数据会执行它

]# gawk 'BEGIN {print "BEGIN:"}
> {print $0}
> END {print "END."}' data1.txt
BEGIN:
1the	quick brown fox jumps over the lazy dog
2the quick brown fox jumps over the lazy dog
3the quick brown fox jumps over the lazy dog
4the quick brown fox jumps over the lazy dog
5the quick brown fox jumps over the lazy dog
END.

创建一份完整的报告示例

]# cat script3.gawk 
BEGIN{
print "The latest list of users and shells"
print " UserID \t Shell"
print "---------\t---------"
FS=":"
}
{
print $1 "	\t "$7
}
END{
print "This concludes the listing"
}
]# gawk -f script3.gawk /etc/passwd
The latest list of users and shells
 UserID 	 Shell
---------	---------
root		 /bin/bash
bin		 /sbin/nologin
daemon		 /sbin/nologin
adm		 /sbin/nologin
...
ycp		 /bin/bash
This concludes the listing