Linux 正则表达式流编辑之sed awk

最新推荐文章于 2024-01-10 12:19:10 发布

莫言静好、

最新推荐文章于 2024-01-10 12:19:10 发布

阅读量770

点赞数 1

分类专栏： Linux 脚本/shell 文章标签： Linux 正则表达式 sed awk

本文链接：https://blog.csdn.net/zhanglh046/article/details/50413405

版权

Linux 同时被 2 个专栏收录

32 篇文章 0 订阅

订阅专栏

脚本/shell

4 篇文章 0 订阅

订阅专栏

一、命令行中的正则特殊符号

1 [:alnum:]等价于a-zA-Z 0-9

2 [:alpha:]等价于a-zA-Z

3 [:digit:]等价于0-9

4 [:lower:]等价于a-z

5 [:upper:]等价于A-Z

6 [:punct:]代表标点符号

7 [:blank:]代表标点符号

二：grep 详解

命令格式：grep [options] ‘字符串’file

参数选项：

-n:打印出行号

-c:计算找到的字符串的个数

-i:忽略大小写不同

-v:显示没有当前字符串的行

-An: 除了该行外，后面n行也会列出来(after)

-Bn: 除了该行外，前面n行也会列出来(after)

[root@localhost nicky]# grep -n '[[:lower:]]or' new.txt

[root@localhost nicky]# grep -n '[[:upper:]]or' new.txt

[root@localhost nicky]# grep -n -A 2 'WorkbenchConfig.xml - Specifies the Endeca Configuration Repository' \

> new.txt

三：查找集合字符

场景：查找fork fosk 的字符串

[root@localhost nicky]# grep -n 'fo[rs]k' new.txt

场景：查找fork fask 的字符串

[root@localhost nicky]# grep -n 'fo(or|as)k' new.txt

场景：查找非字母x+oo开头的字符

[root@localhost nicky]# grep -n '[^x]oo' new.txt

场景：查找非大写字母+oo开头的字符

[root@localhost nicky]# grep -n '[^[:upper:]]oo' new.txt

四：行首与行尾字符

'^the''!$'

表示以the开始以！结束的字符

场景：查找非endeca 开始的字符

[root@localhost nicky]# grep -n '^endeca' new.txt

场景：查找！结束的字符

[root@localhost nicky]# grep -n '!$' new.txt

场景：查找小写字母开始的字符

[root@localhost nicky]# grep -n '^[[:lower:]]oo' new.txt

场景：查找空行的字符

[root@localhost nicky]# grep -n '^$' new.txt

五：字符正则表达式的.和 *

.: 代表一定有一个任意字符

*：代表前一个字符0到N次

场景：查找包含o或者多个连续的o的字符

[root@localhost nicky]# grep -n 'o*' new.txt

[root@localhost nicky]# grep -n 'g..d' new.txt

注意：正则* 和命令行使用的*通配符区别：

1 这里是重复前一个字符0-N次

2 代表0个或者N个字符，并没有重复的意思

六：限定连续字符的范围

场景：查找包含oo 2个字符，即：包含oooo的字符

[root@localhost nicky]# grep -n 'oo\{2\}' new.txt

场景：查找包含o 2-5个字符

[root@localhost nicky]# grep -n 'oo\{2，5\}' new.txt

七：常用工具sed - 流编辑

功能：

1 可以分析标准输出

2 新增行

3 删除行

4 提换行

5 选定行数

命令格式: sed [options] [action]

参数选项：

-n:使用silent 模式，默认情况下sed 标准输入数据一般都会列在屏幕上，加上-n，只有经过sed处理的行才会显示在屏幕上

-e:直接在命令行模式下进行sed操作

-ffile: sed 操作之后的结果写入一个文件

-r：支持扩展正则表达式

-i:直接修改读取文件的内容，而不会显示在屏幕上

Action: [n1[,n2]] function

注意：n1, n2 代表行数，不见得一定存在

Function的参数选项：

a 字符串: 新增字符串，在新的一行出现[当前行的下一行]

i 字符串: 新增字符串，在新的一行出现[当前行的上一行]

d ：删除，后面不接字符串

c 字符串：可以替换选定行数的字符串，即覆盖选定行的字符串，用c后的字符串代替

s :替换，通常可以和正则搭配使用s/old/new/g

p :打印

场景：读取文件，删除6-7行

[root@localhost nicky]# less test.txt | sed '6,7d'

场景：读取文件，删除5行,但不在屏幕上显示

[root@localhost nicky]# less test.txt | sed -n '5d'

场景：读取文件，第一行下面添加beautiful girl;

[root@localhost nicky]# less test.txt | sed '1a beautiful girl'

When errors occur during the execution of a deployment template script, consult the error messages in the

beautiful girl

场景：读取文件，第一行上面添加handsome boy;

[root@localhost nicky]# less test.txt | sed '1i handsome boy'

handsome boy

When errors occur during the execution of a …

场景：替换2-5 字符为NO 2-5 Showing Replace String

[root@localhost nicky]# less test.txt | sed '2,5c NO 2-5 Showing Replace String'

When errors occur during the execution of a deployment template script, consult the error messages in the

NO 2-5 Showing Replace String

Nicky test sed function

场景：仅仅列出3-4行

[root@localhost nicky]# less test.txt | sed -n '2,4p'

log files of the Endeca Application Controller (EAC) or the Workbench for information about the errors. These

messages can help you analyze the cause of the errors by revealing the server state, operations performed,

and exceptions encountered by Workbench or EAC Note that deployment template scripts rely primarily on

场景：正则替换

[root@localhost nicky]# less test.txt | grep -n 'error' | sed 's/error/danger/g'

1:When dangers occur during the execution of a deployment template script, consult the danger messages in the

2:log files of the Endeca Application Controller (EAC) or the Workbench for information about the dangers. These

3:messages can help you analyze the cause of the dangers by revealing the server state, operations performed

场景：直接修改原文件内容，默认情况我们是将文件读取出来进行出来，一般不会影响到源文件，但如果想修改源文件，语法格式：

sed-i [action] file

[root@localhost nicky]# sed -i 's/error/danger/g' test.txt

八：常用工具awk - 文本处理器

awk是一款很还用的文本数据处理利器，和sed处理整行数据相比，他更擅长针对字段进行处理

首先：awk会将数据根据字段设置参数：

1 $0: 代表整行

2 $1: 代表第一列

3 $2: 代表第二列

4 $n: 代表第n列

格式：

1 文件作为输入源：awk'{action}' file

2 命令结果为输入源：command | awk '{action}'

场景：打印第一列和打印第一列和第三列并取字段名字

[root@localhost nicky]# awk '{print $1}' address.txt

[root@localhost nicky]# awk '{print "name:" $1 "\tbirthday:" $4}' address.txt

name:nicky birthday:1987-06-12

name:belly birthday:1988-02-03

name:hejuan birthday:1989-03-04

name:alice birthday:1987-01-12

name:frank birthday:1985-12-12

变量与数组：

8.1 变量：

awk：有两种变量，用户自定义变量和内建变量，那么常用的内建变量有哪些：

变量	说明
FILENAME	当前输入的文件名称
FNR	当前输入的文件记录数
FS	字段分隔符，支持正则
NF	记录的字段数
NR	在工作中的记录数
OFS	输出字段分隔符
ORS	输出记录分割字符
RS	输出记录分割字符

8.2: 数组

数组下标可以是数字，也可以是字符

删除某一个数值：delete array[index]

删除数组：delete array

不需要声明就可以使用

8.3 环境变量

访问环境变量: ENVIRON["PATH"]

使用其他字段分隔符：

我们知道awk默认使用空格分割字符，如果有些不是用空格分割而是用tab 或者：或者| 分割呢，怎么办？

[root@localhost nicky]# awk -F":" '{print "User:" $1 "\tPassword:" $2}' new.txt

场景：设置多个字符为分隔符

[root@localhost nicky]# awk -F"[\t ]+" '{print "Device:" $1 "\tFStype:" $3}' /etc/fstab

awk的代码结构：

awk代码结构分为三个部分：BEGIN 代码块 | 中间部分代码块|END代码块

可以认为是：处理输入前的初始化/处理输入过程/处理完输入后的所有过程

例子:BEGIN 代码块

[root@localhost nicky]# cat > fs.awk << "end"

> BEGIN {

> FS=":"

> }

> {

> print "USER:" $1 "\tSHELL:"$7

> }

> end

[root@localhost nicky]# cat fs.awk

BEGIN {

FS=":"

}

{

print "USER:" $1 "\tSHELL:"$7

}

[root@localhost nicky]# head -n 10 /etc/passwd | awk -f fs.awk

USER:root SHELL:/bin/bash

USER:bin SHELL:/sbin/nologin

USER:daemon SHELL:/sbin/nologin

USER:adm SHELL:/sbin/nologin

USER:lp SHELL:/sbin/nologin

USER:sync SHELL:/bin/sync

USER:shutdown SHELL:/sbin/shutdown

USER:halt SHELL:/sbin/halt

USER:mail SHELL:/sbin/nologin

USER:uucp SHELL:/sbin/nologin

模式匹配：awk /正则表达式/匹配后的操作

将所有空行打印一句话：

[root@localhost nicky]# awk '/^$/{print "This line is empty"}' tmp.txt

判断和循环：

if 语句：

if(expression) {

statement;....

} else if(expression){

statement;....

}else{

statement;....

}

[root@localhost nicky]# awk '{if($1 > $2) print $2 "too hight"}' num.txt

while & do/while 语句：

While(expression){

statement;....

}

Do{

statement;....

}while(expression);

}

多条记录：

在有些时候，记录是跨行的比如这种情况：

[root@localhost nicky]# cat multiLineRecord.txt

Jimmy the Weasel

100 Pleasant Drive

San Fransical,CA 12345

Big Tony

200 Incognito Ave

Suburbia,WA 67890

Oilir zhang

NorthEast Normal University

Renmin Road,changchun,210000

理论上，我们希望awk将每三行作为一个记录，而不是每一行作为一个记录，如果awk将第一行作为$1,第二行作为$2,第三行做为$3,那么处理起来相对简单。

[root@localhost nicky]# cat a.awk

BEGIN{

FS="\n" #per field as a line

RS="" #record split by blank

}

{

print "Name:"$1 "\tUniversity: " $2 "\tAddress:" $3

}

[root@localhost nicky]# awk -f a.awk multiLineRecord.txt

Name:Jimmy the Weasel University: 100 Pleasant Drive Address:San Fransical,CA 12345

Name:Big Tony University: 200 Incognito Ave Address:Suburbia,WA 67890

Name:Oilir zhang University: NorthEast Normal University Address:Renmin Road,changchun,210000

改变输出字段分割符号，比如输出的记录是：

[root@localhost nicky]# cat a.awk

BEGIN{

FS="\n" #per field as a line

RS="" #record split by blank

OFS=", " # Define the output field delimeter

}

{

print $1, $2, $3

}

[root@localhost nicky]# awk -f a.awk multiLineRecord.txt

Jimmy the Weasel, 100 Pleasant Drive, San Fransical,CA 12345

Big Tony, 200 Incognito Ave, Suburbia,WA 67890

Oilir zhang, NorthEast Normal University, Renmin Road,changchun,210000

常用的函数：

awk 函数	说明
sub(/reg/,new,old)	只替换第一个匹配的字符串
gsub(/reg/,new,old)	字符串中所有的符合reg匹配都替换
index(str,substr)	查找子串在str中的索引
length(str)	字符串的长度
Match(str,/reg/)	如果在字符串中能找到匹配/reg/的串，return true
Split(str,array,sep)	用分隔符sep把字符串分解成数组array
Substr(str,p1[,length])	返回从p1开始的length个字符
Toupper(str)	将字符串进行大写转换

9 awk 和 sed 流编辑工具的比较

Sed 工作流程：

将处理的行读入模式空间（pattern space），然后在模式空间进行sed操作，输出行，模式被清空，然后再重复刚才的动作，直到文件处理完毕。

还有一个概念是hold space,相当于是一个pattern space处理的东西，临时放在这里。

命令	功能
a\	向当前行添加文本行
c\	用新的文本行取代当前行里的文本
i\	在当前行之前插入文本
d	删除行
h	把模式空间内容复制到一个固定缓存
H	把模式空间内容添加到一个固定缓存
g	把固定缓存里的所有文本都复制到模式缓存，重写模式缓存的内容
G	把固定缓存里的所有文本添加到模式缓存
l	列出不打印的字符
p	打印行
P(大写)	多行打印，输出多行模式空间里的第一部分，直到第一个嵌入的换行符为止
n	输出模式空间的内容并读取下一行
N	读取新的输入行并将其添加到模式空间的现有内容之后来创建多行模式空间
q	结束或退出
r	读入行，从某个文件
!	把命令应用到选定行之外的其它所有行
s	替换

每一行后面增加一空行：

[root@localhost nicky]# cat new.txt | sed G

[root@localhost nicky]# cat new1.txt | awk '{printf(

"%s\n\n",$0)}'

删除空行，然后在每一行后面添加一空行：

[root@localhost nicky]# cat new.txt | sed '/^$/d'

[root@localhost nicky]# cat new1.txt | awk '!/^s/{printf("%s\n\n",$0)}'

莫言静好、

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
Linux 正则表达式流编辑之sed awk

一、命令行中的正则特殊符号1 [:alnum:]等价于a-zA-Z 0-92 [:alpha:]等价于a-zA-Z3 [:digit:]等价于0-94 [:lower:]等价于a-z5 [:upper:]等价于A-Z6 [:punct:]代表标点符号7 [:blank:]代表标点符号二：grep 详解命令格式：grep [options] ‘字符串’fil
复制链接

扫一扫