shell文本处理grep|sed|awk

最新推荐文章于 2023-08-13 20:12:37 发布

话多必失丶

最新推荐文章于 2023-08-13 20:12:37 发布

阅读量633

点赞数 1

本文链接：https://blog.csdn.net/u010489158/article/details/87660165

版权

一、简介

1、正则表达式与通配符

正则表达式用来在文件中匹配符合条件的字符串，正则是包含匹配。grep、awk、 sed等命令可以支持正则表达式。

通配符用来匹配符合条件的文件名通配符是完全匹配。

ls、find、cp这些命令不支持正则表达式，所以只能使用shell自己的通配符来进行匹配了。

2、基础正则表达

在这里插入图片描述

3、基本的一些字符表示方法

在这里插入图片描述

二、具体使用

1 、grep命令

参数

grep后可以添加参数来实现不同的操作

-A # ：显示匹配行以及下面的#行

-B # ：显示匹配行以及上面的#行

-C # ：显示匹配行以及上下各#行

-E ：使用扩展的表达式等同于egrep

-o ：只打印匹配到的字符，其他内容不显示

使用方法

*表示前面一个字符出现0次或任意多次

grep "a*" test_rule.txt  # 匹配所有内容，包括空白行

grep "aa*" test_rule.txt # 匹配 至少包含有一个 a 的行

grep "aaa*" test_rule.txt  匹配最少包含两个连续a的字符串 

grep "aaaaa*" test_rule.txt #则会匹配最少包含四个个连续a的字符串

“.” 匹配除了换行符外任意一个字符

grep  "s..d" test_rule.txt  # “ s..d ”会匹配在 s 和 d 这两个字母之间一定有两个字符的单词

grep "s.*d" test_rule.txt  # 匹配 在 s 和 d 字母之间有任意字符

grep ".*" test_rule.txt  # 匹配所有内容

“^”匹配行首，“$”匹配行尾

grep "^M" test_rule.txt # 匹配以大写“ M ”开头的行

grep "n$" test_rule.txt # 匹配以小写“ n ”结尾的行

grep -n "^$" test_rule.txt # 会匹配空白行

“[]” 匹配中括号中指定的任意一个字符，只匹配一个字符

grep "s[ao]id" test_rule.txt # 匹配 s 和 i 字母中，要不是 a 、要不是 o 

grep "[0-9]" test_rule.txt # 匹配任意一个数字

grep "^[a-z]" test_rule.txt # 匹配用小写字母开头的行

“[^]” 匹配除中括号的字符以外的任意一个字符

grep "^[^a-z]" test_rule.txt  # 匹配不用小写字母开头的行

grep "^[^a-zA-Z]" test_rule.txt # 匹配 不用字母开头的行

“\” 转义符

 grep "\.$" test_rule.txt  # 匹配使用“ . ”结尾的行

“{n}”表示其前面的字符恰好出现n次

grep "a\{3\}" test_rule.txt # 匹配 a 字母连续出现三次的字符串

grep "[0-9]\{3\}" test_rule.txt # 匹配包含连续的三个数字的字符

“{n,}”表示其前面的字符出现不小于n次

grep "^[0-9]\{3,\}[a-z]" test_rule.txt # 匹配最少用连续三个数字开头的行

“{n,m}”匹配其前面的字符至少出现n次，最多出现m次

grep "sa\{1,3\}i" test_rule.txt # 匹配在字母 s 和字母 i 之间有最少一个 a ，最多三个a

2、sed命令

sed是一种流编辑器，它是文本处理中非常有用的工具，能够完美的配合正则表达式使用，功能不同凡响。处理时，把当前处理的行存储在临时缓冲区中，称为『模式空间』（pattern space），接着用sed命令处理缓冲区中的内容，处理完成后，把缓冲区的内容送往屏幕。接着处理下一行，这样不断重复，直到文件末尾。文件内容并没有改变，除非你使用重定向存储输出。sed主要用来自动编辑一个或多个文件，简化对文件的反复操作，编写转换程序等。编写转换程序等

sed的命令格式以及选项参数

命令格式

sed [options] 'command' file(s)
sed [options] -f scriptfile file(s)

选项

参数	完整参数	说明
-e script	–expression=script	以选项中的指定的script来处理输入的文本文件
-f script	–files=script	以选项中的指定的script文件来处理输入的文本文件
-h	–help	显示帮助
-n	–quiet --silent	仅显示script处理后的结果
-V	–version	显示版本信息

sed命令

命令	说明
d	删除，删除选择的行
D	删除模板块的第一行
s	替换指定字符
h	拷贝模板块的内容到内存中的缓冲区
H	追加模板块的内容到内存中的缓冲区
g	获得内存缓冲区的内容，并替代当前模板块中文本
G	获得内存缓冲区的内容，并追加到当前模板块文本的后面
l	列表不能打印字符的清单
n	读取下一个输入行，用下一个命令处理新的行而不是第一个命令
N	追加下一个输入行到模板块后面并在二者间嵌入一个新行，改变当前行号码
p	打印模板块的行
P	打印模板块的第一行
q	退出sed
b label	分支到脚本中带有标记的地方，如果分支不存在则分支到脚本的末尾
r file	从file中读行
t label	if分支，从最后一行开始，条件一旦满足或者T，t命令，将导致分支到带有标号的命令处，或者到脚本的末尾
T label	错误分支，从最后一行开始，一旦发生错误或者T，t命令，将导致分支到带有标号的命令处，或者到脚本的末尾
w file	写并追加模板块到file末尾
W file	写并追加模板块的第一行到file末尾
!	表示后面的命令对所有没有被选定的行发生作用
=	打印当前行号
#	把注释扩展到第一个换行符以前

sed替换标记

命令	说明
g	表示行内全面替换
p	表示打印行
w	表示把行写入一个文件
x	表示互换模板块中的文本和缓冲区中的文本
y	表示把一个字符翻译为另外的字符（但是不用于正则表达式）
\1	子串匹配标记
&	已匹配字符串标记

sed命令的使用

sed [options] ‘Addresscommand’ file(s)
sed [options] -f scriptfile file(s)
其中Address的类型有以下5种

start line，end line
例如：1,100表示第一行到第一百行
/正则式表达/
例如：/^root/表示以root开头的行
/正则表达1/,/正则表达2/
例如：从匹配到正则表达式1的行到匹配到正则表达式2的行
Line number
例如：1表示精确第一行
start line,+N
例如：1,+10 表示从第一行到向下的10行

添加文本文件内容

[root@localhost opt]# cat text.txt 
###MacBook-Pro:tmp maxincai$ cat test.txt

#my cat's name is betty

####my dog's name is frank

This is your fish

删除文本中开头的#号，只要位于开头不管有多少个

[root@localhost opt]# cat text.txt |sed 's/^#*//g'
MacBook-Pro:tmp maxincai$ cat test.txt

my cat's name is betty

my dog's name is frank

This is your fish

将/etc/passwd文件中的1和3列中的：全部替换为-UUID ：

[root@localhost opt]# cat /etc/passwd|cut -d : -f1,3 |sed 's/:/-UUID : /'
root-UUID : 0
bin-UUID : 1
daemon-UUID : 2
adm-UUID : 3
lp-UUID : 4
sync-UUID : 5
shutdown-UUID : 6

#g标记可以使sed命令处理每一行中第#次到N次匹配的信息

[root@localhost opt]# echo  hellohellohello | sed 's/hello/HELLO/2g'
helloHELLOHELLO

删除文本中被模式匹配到的行

[root@localhost opt]# cat text.txt |sed '1,2d'  #将第一行到第二行删除
#my cat's name is betty

####my dog's name is frank

[root@localhost opt]# cat text.txt |sed '/^#/d'  #将#开头的行删除



This is your fish

[root@localhost opt]# cat text.txt |sed -n '/Mac/,/cat/p'   #从里面含有Mac的行到含有cat的行都删除
###MacBook-Pro:tmp maxincai$i cat test.txt

#my cat's name is betty

[root@localhost opt]# cat text.txt |sed '1d'   #删除第一行

#my cat's name is betty

####my dog's name is frank

This is your fish
[root@localhost opt]# cat text.txt |sed '1,+1d'   #删除第一行和下面的一行
#my cat's name is betty

####my dog's name is frank

This is your fish

删除空行

[root@localhost opt]# cat text.txt  |sed '/^$/d'
###MacBook-Pro:tmp maxincai$i cat test.txt
#my cat's name is betty
####my dog's name is frank
This is your fish

已匹配字符串标记(&)

[root@localhost opt]# echo this is test |sed 's/\w\+/[&]/g'
[this] [is] [test]
[root@localhost opt]# echo this is test | sed 's/[[:lower:]]\+/[&]/g'
[this] [is] [test]
[root@localhost opt]# echo this is test | sed 's/[[:lower:]]*/[&]/g'
[this] [is] [test]

子串匹配标记(\1，\2)

\w\+表示一个字符出现一次或多次
[[:lower:]]*表示小写字符出现一次或多次

[root@localhost opt]# echo aaa bbb|sed 's/\([[:lower:]]*\) \([[:lower:]]*\)/\2 \1/g'
bbb aaa
[root@localhost opt]# echo aaa bbb |sed 's/\(\w\+\) \(\w\+\)/\2 \1/g'
bbb aaa
[root@localhost opt]# echo aaa bbb |sed 's/\(\w*\) \(\w*\)/\2 \1/g'
bbb aaa

-e的使用方法

[root@localhost opt]# cat text.txt  |sed -e '1d' -e '2i this is test'
this is test

#my cat's name is betty

####my dog's name is frank

This is your fish

引用
sed表达式通常用单引号来引用。不过也可以使用双引号。shell会在调用sed前会先扩展双引号中的内容。如果想在sed表达式中使用变量，双引号就能派上用场了。

[root@localhost opt]# test=1
[root@localhost opt]# sed "1i $test" text.txt 
1
###MacBook-Pro:tmp maxincai$i cat test.txt

#my cat's name is betty

####my dog's name is frank

This is your fish

sed的各种模式的操作

模式	作用
p	显示
d	删除
a	添加
c	替换
w	写入
i	插入

1、p模式-显示

[root@localhost opt]# cat text.txt |sed '/^T/p'#默认会打印模式空间中的内容
###MacBook-Pro:tmp maxincai$i cat test.txt

#my cat's name is betty

####my dog's name is frank

This is your fish
This is your fish
[root@localhost opt]# cat text.txt |sed -n '/^T/p'   #-n不打印模式空间的内容
This is your fish

2、d模式-删除

[root@localhost opt]# cat text.txt |sed '1d' #删除第一行

#my cat's name is betty

####my dog's name is frank

This is your fish

3、a模式-添加(在指定行之后添加)

[root@localhost opt]# cat text.txt |sed '/^#/a this is test'  #在以#号开头的行后添加this is test
###MacBook-Pro:tmp maxincai$i cat test.txt
this is test

#my cat's name is betty
this is test

####my dog's name is frank
this is test

This is your fish

4、i模式-添加(在指定行之前)

[root@localhost opt]# cat text.txt |sed '/^#/i this is test'
this is test
###MacBook-Pro:tmp maxincai$i cat test.txt

this is test
#my cat's name is betty

this is test
####my dog's name is frank

This is your fish

5、c模式-替换

[root@localhost opt]# cat text.txt | sed '/^#/c this is test' #将以#开头的行替换为this is test
this is test

this is test

this is test

This is your fish

6、w模式-将输出写入文件

[root@localhost opt]# cat text.txt | sed -n '/^#/w /opt/#.txt' #将指定行写到#.txt文档中
[root@localhost opt]# ls
text.txt  #.txt
[root@localhost opt]# cat \#.txt 
###MacBook-Pro:tmp maxincai$i cat test.txt
#my cat's name is betty
####my dog's name is frank

7，将指定文件中的内容添加到文本的指定行后

[root@localhost opt]# cat message 
#####################################
this is message

[root@localhost opt]# sed '$r /opt/message' text.txt 
###MacBook-Pro:tmp maxincai$i cat test.txt

#my cat's name is betty

####my dog's name is frank

This is your fish
#####################################
this is message

3、AWK介绍，使用

AWK是一种优良的文本处理工具。它不仅是 Linux 中也是任何环境中现有的功能最强大的数据处理引擎之一。这种编程及数据操作语言（其名称得自于它的创始人 Alfred Aho 、Peter Weinberger 和 Brian Kernighan 姓氏的首个字母）的最大功能取决于一个人所拥有的知识。AWK 提供了极其强大的功能：可以进行样式装入、流控制、数学运算符、进程控制语句甚至于内置的变量和函数。它具备了一个完整的语言所应具有的几乎所有精美特性。实际上 AWK 的确拥有自己的语言：AWK 程序设计语言，三位创建者已将它正式定义为“样式扫描和处理语言”。它允许您创建简短的程序，这些程序读取输入文件、为数据排序、处理数据、对输入执行计算以及生成报表，还有无数其他的功能。

最简单地说， AWK 是一种用于处理文本的编程语言工具。AWK 在很多方面类似于 shell 编程语言，尽管 AWK 具有完全属于其本身的语法。它的设计思想来源于 SNOBOL4 、sed 、Marc Rochkind设计的有效性语言、语言工具 yacc 和 lex ，当然还从 C 语言中获取了一些优秀的思想。在最初创造 AWK 时，其目的是用于文本处理，并且这种语言的基础是，只要在输入数据中有模式匹配，就执行一系列指令。该实用工具扫描文件中的每一行，查找与命令行中所给定内容相匹配的模式。如果发现匹配内容，则进行下一个编程步骤。如果找不到匹配内容，则继续处理下一行

RS : 读取文本信息时候使用的换行符
ORS : 输出文本信息时候使用的换行符
FS : 读取文件时候，使用的字段分隔符
OFS : 输出文件时候，使用的字段分隔符

NF：有多少个字段

awk可以使用自身变量NR和FNR来处理多个文件。

NR：表示awk开始执行程序后所读取的数据行数。

FNR：awk当前读取的记录数，其变量值小于等于NR（比如当读取第二个文件时，FNR是从0开始重新计数，而NR不会）。

NR==FNR：用于在读取两个或两个以上的文件时，判断是不是在读取第一个文件。

使用方法：awk ‘条件1{动作1}条件2{动作2}…’ 文件名

awk -F分隔符 'BEGIN { 初始化 } { 循环执行部分 } END { 结束处理 }' file_list1 file_list2

其中BEGIN和END可以省略，-F也可以使用默认，循环执行部分，是按行对文件进行处理的。

awk [options]  'script' file1 ,file2

awk [options] 'PATTERN {action}' file1,file2

df -h|awk ‘{printf $1 “\t” $3}’

awk ‘BEGIN{printf"this is file"}{printf $2}’ 在执行前打印this is file

awk ‘END{printf"this is file"}{printf $2}’ 在执行后打印

awk ‘BEGIN{FS=“G”}{printf $1 “\t” $2 “\n”}’ 可以通过FS="G"指定分隔符为G

一些常用方式

 awk '{print NR,NF,FILENAME}' file                ###统计行列并在每一行，NR:行号 NF:列 FILENAME:文件名
 awk 'BEGIN{print "name"}' file                   ###初始化代码块(先处理,在处理其他),只能出现一次
 awk 'END{print WESTOS}' file                     ###结束代码块(处理完结果后在处理),只能出现一次
 awk -F ":" '/\<bash$/{print $1}' file            ##找出含有bash关键字的行并以分号为分隔符,打印出第1列,F：指定分隔符
 awk -F "[: ]+" 'print $1' file                   ###以分号和空格为分隔符,打印出第1列     
 awk 'BEGIN{a=1;print a+1}' file                  ###先给a赋值在进行加法
 awk '/bash\>/{a++}END{print a}' file             ###找出可登陆的用户,并统计出个数
 awk -F ":" '/^root/{print}' file                 ###找出以root开头的并打印出来
 awk -F ":" '/^a|nologin$/{print $1,$7}' file     ###找出以a开头的或者nologin结尾的,打印出第1,7列
 awk -F ":" '$6~/bin$/{print $1,$7}' file         ###以冒号为分隔符,找出第6列为bin结束的行,打印出其第1，7列
 awk -F ":" '$6!~/bin$/{print $1,$7}' file        ###以冒号为分隔符,找出第6列不是bin结束的行,打印出其第1，7列

 awk 'NR==2,NR==5{print}' file                    ###显示出2,5行
 awk '/a/,/b/{print}' file                        ###匹配出有a字符的行到有b字符的行,并打印出来

打印指定的列

[root@localhost opt]# cat message 
this is message
[root@localhost opt]# awk '{print $1,$3}' message 
this message
[root@localhost opt]# awk '{print $0}' message   #$0表示所有的列
this is message

指定输出分隔符

[root@localhost opt]# awk 'BEGIN{OFS=":"}{print $1,$2}' message 
this:is
[root@localhost opt]# awk 'BEGIN{OFS=":"}{print $1,"####",$2,$3}' message  #添加字符
this:####:is:message

-v使用自定义变量

[root@localhost opt]# awk -v test="hello" 'BEGIN{print test}'
hello

printf进行格式化输出
%-10s表示左对齐，%+10s表示右对齐

this      is

-F 指定分隔符

[root@localhost opt]# awk -F : '{printf "%d\n", $3}' /etc/passwd
0
1
2
3
4
5
6
7
8
10
11
12
13
14
99
81
113
69
499
170
173
68
42
38
48
498
89
497
74
72
500
27

匹配对应信息的行

[root@localhost opt]# awk -F : '$3==0,$7~"nologin"{print $1,$3,$7}' /etc/passwd
root 0 /bin/bash
bin 1 /sbin/nologin

给输出信息添加表头

[root@localhost opt]# awk -F : 'BEGIN{print "USERNAME      ID     SHELL" }{printf "%-10s%-10s%-10s\n" ,$1,$3,$7}' /etc/passwd
USERNAME      ID     SHELL
root      0         /bin/bash 
bin       1         /sbin/nologin
daemon    2         /sbin/nologin
adm       3         /sbin/nologin
lp        4         /sbin/nologin
sync      5         /bin/sync 
shutdown  6         /sbin/shutdown
halt      7         /sbin/halt
mail      8         /sbin/nologin
uucp      10        /sbin/nologin
operator  11        /sbin/nologin
games     12        /sbin/nologin
gopher    13        /sbin/nologin
ftp       14        /sbin/nologin
nobody    99        /sbin/nologin
dbus      81        /sbin/nologin
usbmuxd   113       /sbin/nologin
vcsa      69        /sbin/nologin
rtkit     499       /sbin/nologin
avahi-autoipd170       /sbin/nologin
abrt      173       /sbin/nologin
haldaemon 68        /sbin/nologin
gdm       42        /sbin/nologin
ntp       38        /sbin/nologin
apache    48        /sbin/nologin
saslauth  498       /sbin/nologin
postfix   89        /sbin/nologin
pulse     497       /sbin/nologin
sshd      74        /sbin/nologin
tcpdump   72        /sbin/nologin
redhat    500       /bin/bash 
nginx     27        /bin/bash

通过awk对用户进行判别

[root@localhost opt]# awk -F : '{if ($1=="root") print $1,"Admin";else print $1 ,": Common User"}' /etc/passwd
root Admin
bin : Common User
daemon : Common User
adm : Common User
lp : Common User
sync : Common User
shutdown : Common User
halt : Common User
mail : Common User
uucp : Common User
operator : Common User
games : Common User
gopher : Common User
ftp : Common User
nobody : Common User
dbus : Common User
usbmuxd : Common User
vcsa : Common User
rtkit : Common User
avahi-autoipd : Common User
abrt : Common User
haldaemon : Common User
gdm : Common User
ntp : Common User
apache : Common User
saslauth : Common User
postfix : Common User
pulse : Common User
sshd : Common User
tcpdump : Common User
redhat : Common User
nginx : Common User

打印出文件中字符数大于4个的字符

[root@localhost opt]# awk -F : '{i=1;while (i<=NF) {if (length($i)>=4) print $i;i++;}}' /etc/passwd

root
root
/root
/bin/bash
/bin
/sbin/nologin
daemon
daemon
/sbin
/sbin/nologin
/var/adm
/sbin/nologin
/var/spool/lpd
/sbin/nologin
sync
sync
/sbin
/bin/sync
shutdown
shutdown
/sbin
/sbin/shutdown

{i=1;
while (i<=NF)
{if (length($i)>=4)
        print $i;
        i++;
}
}
上面的代码等同于此格式，这样比较容易理解，和c语言类似

统计文本中的字符出现个数

[root@localhost opt]# awk -F : '{shell[$NF]++}END{for (A in shell){print A,shell[A]};}' /etc/passwd
/sbin/shutdown 1
/bin/bash 3
/sbin/nologin 26
/sbin/halt 1
/bin/sync 1

{shell[$NF]++}   
#定义数组，NF为文件的最后一个字段位置数，例如a b c的NF为3,$NF=$3=c,如果字段已经在数组中，则对其数量加1，如果没有在数组中，则进行创建对应数组，并+1
END   #在最后进行打印
{for (A in shell)   #对shell数组进行循环
        {print A,shell[A]};   #打印数组的名称和对应的值
}

显示当前系统可登陆用户的name和id

awk -F ":" 'BEGIN{print "name id"}/\<bash$/{print $1" "$3}' /etc/passwd

统计行数可登陆行数：

awk 'BEGIN{n=0}/\<bash$/{n++}END{print n}' /etc/passwd

能够登陆且家目录不在home下的用户

awk -F ":" '/\<bash$/&&$6!~/^\/home/{print $1}' /etc/passwd

显示文本中3-5行的内容

awk -F ":" 'NR>=3&&NR<=5{print}' /etc/passwd

显示文本中6和8的内容

awk -F ":" 'NR==6||NR==8{print}' /etc/passwd

抓取ip地址

ifconfig eth0 |awk 'NR==2{print $2}'

话多必失丶

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
shell文本处理grep|sed|awk

一、简介1、正则表达式与通配符正则表达式用来在文件中匹配符合条件的字符串，正则是包含匹配。grep、awk、 sed等命令可以支持正则表达式。通配符用来匹配符合条件的文件名通配符是完全匹配。ls、find、cp这些命令不支持正则表达式，所以只能使用shell自己的通配符来进行匹配了。2、基础正则表达3、基本的一些字符表示方法二、具体使用1 、grep命令参数grep后可...
复制链接

扫一扫