linux shell脚本攻略第四章让文本飞 grep，cut，sed，awk，paste

最新推荐文章于 2021-05-13 04:03:35 发布

IanWatson

最新推荐文章于 2021-05-13 04:03:35 发布

阅读量371

点赞数

分类专栏： python shell 文章标签： linux python

本文链接：https://blog.csdn.net/IanWatson/article/details/105484948

版权

python 同时被 2 个专栏收录

8 篇文章 0 订阅

订阅专栏

shell

8 篇文章 0 订阅

订阅专栏

1.正则表达式

正则表达式手册：https://tool.oschina.net/uploads/apidocs/jquery/regexp.html

举例

匹配邮箱：[a-z0-9_]+@[a-z0-9]+.[a-z]+
匹配所有单词：( ?a-zA-z ?)
匹配ip：[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}
处理特殊字符
特殊字符包括$,+,^,*,.等
加反斜杠\，如a.txt在正则中为a.txt

2.使用grep搜索文本

常用选项
grep word：搜索包含模式的匹配行
-o：只输出匹配到的部分
-v：反转匹配结果，打印没有匹配到的行
-c：统计匹配行数
-n：打印出行号和行
-b：打印出字节偏移，总是和-o一起使用
-l：搜索多个文件并输出匹配模式在哪个文件
-R：递归搜索
-i：忽略大小写
-e：指定多个模式，grep -e pattern1 -e pattern2
-f：从文件中读取模式
-Z：输出以’\0’字节作为终止字符
-q：静默输出，用于检测搜索结果
-ABC 5：打印前，后，前后5行

grep -o|wc -l：输出匹配次数
–include：包含那些文件，例如–include *.{cpp,c}
–exclude：不包含哪些文件，例如–exclude *.{md, git}
–color=auto：着重标记匹配到的单词
grep -E或者egrep：使用正则表达式匹配
举例

Ian>grep maybe a.txt 
It's maybe the slowest bus Which colour for her is right
It's maybe the slowest bus
Ian>grep  -E "[W-Z]+" a.txt 
It's maybe the slowest bus Which colour for her is right
You tell me truth will not be here
Which colour for her is right
You tell me they are tough and red
Ian>grep  -v i a.txt 
And I see all the teenagers'eyes you tell me they are tough and red
It's maybe the slowest bus
And I see all the teenagers'eyes
You tell me they are tough and red
Ian>grep  -o up a.txt 
up
Ian>grep  -c it a.txt 
10
Ian>grep  -o u a.txt |wc -l
26
Ian>grep  -n up a.txt
8:Put up you in sandwiches hands oh I think it not really cool
Ian>grep  -b -o up a.txt
366:up
Ian>grep -l "it" a.txt b.txt 
a.txt
b.txt
Ian>grep -R -n "up" 
out.html:51:		 Charsets / OS/2 support © 2001 by Kyosuke Tokoro
grep/a.txt:8:Put up you in sandwiches hands oh I think it not really cool
grep/b.txt:8:Put up you in sandwiches hands oh I think it not really cool
Ian>grep -i "RED" a.txt 
And I see all the teenagers'eyes you tell me they are tough and red
You tell me they are tough and red
Ian>grep -e "up" -e "red" a.txt 
And I see all the teenagers'eyes you tell me they are tough and red
Put up you in sandwiches hands oh I think it not really cool
You tell me they are tough and red
Ian>cat pat_file 
you
up
Ian>grep -f pat_file a.txt 
And I see all the teenagers'eyes you tell me they are tough and red
Say say it again you know the past things could set my free
Put up you in sandwiches hands oh I think it not really cool
I've found it in your eyes
Say say it again you know the past things could set my free
Say say it again you know the past things could set my free
Say say it again you know the past things could set my free
Ian>grep -q "dada" a.txt 
Ian>echo $?
1

3.使用cut按列切分文件

常用命令
-d：指定定界符
-f ：指定字段
-c：指定字符
-b：指定字节
-c/f/b 2,3：指定第2.3个字符，字段，字节
-c/f/b 2-4：指定第2-4个字符，字段，字节
-c/f/b -4：指定1-4个字符，字段，字节
-c/f/b 5-：指定第5到最后一个字符，字段，字节
–output-delimiter “,”：指定多个字段时的分隔符，比如-c 2-4,5-6 --output-delimiter “,”
–complement：提取补集
举例

Ian>cat a.txt 
1 2 3 4 5 6 
2 4 6 8 9 11
3 6 9 11 12 13
Ian>cut -d " " -f 2-3 a.txt 
2 3
4 6
6 9
cut -d " " -f 2-3,4-5 a.txt --output-delimiter "."
2.3.4.5
4.6.8.9
6.9.11.12
cut -c 2-3,4-8 a.txt --output-delimiter "."
 2. 3 4 
 4. 6 8 
 6. 9 11
Ian>cut -c 2-3,4- a.txt --output-delimiter "."
 2. 3 4 5 6 
 4. 6 8 9 11
 6. 9 11 12 13
Ian>cut -b 2-6,7- a.txt --output-delimiter "...."
 2 3 ....4 5 6 
 4 6 ....8 9 11
 6 9 ....11 12 13

4.使用sed进行替换

sed命令详解：https://www.cnblogs.com/ftl1012/p/9250171.html
此处只讲替换功能

常用方法
- 常见参数
  参数s：替换文本,替换命令用替换模式替换指定模式
  参数g：替换行中所有的匹配，如果想从第2个处替换：2g
  定界符：可以任意指定定界符，如替换成":“或者”|"，定界符出现在样式内部时需要使用""转义
- 常用命令
  -e：允许多项编辑，如sed -e “s/a/b/g” -e “s/c/d/g” file
  -i：将替换直接写入原文件
- 举例

Ian>cat a.txt 
aa ab ac ad aa aa
Ian>sed "s/aa/SS/" a.txt #只修改第一个
SS ab ac ad aa aa
Ian>sed "s/aa/SS/2g" a.txt #从第二个开始修改
aa ab ac ad SS SS
Ian>sed -e "s/aa/SS/" -e "s/ab/DD/" a.txt #选择多个表达式
SS DD ac ad aa aa
Ian>sed -i "s/aa/SS/g" a.txt 
Ian>cat a.txt 
SS ab ac ad SS SS
Ian>sed  "sa\adaSAag" a.txt #定界符为a，出现在内容中，需要转义
SS ab ac SA SS SS
Ian>cat b.txt 
1221

12312
Ian>sed "/^$/d" b.txt #删除空白行
1221
12312

高级用法
- sed -ibak：直接替换内容，且生成一个原文件的副本file.bak
- &：标记匹配到的字符串
- \1,\2：标记匹配到的第一个，第二个
- 双引号：可以使用变量
- 举例

Ian>sed -i.bakfile "s/ad/dwdwd/g" a.txt #添加bak文件
Ian>ls
a.txt  a.txt.bakfile  b.txt
Ian>cat a.txt
SS ab ac dwdwd SS SS
Ian>sed "s/ac/[&]/g" a.txt #&标记匹配到的字符串
SS ab [ac] dwdwd SS SS
Ian>cat a.txt
SS ab ac dwdwd SS SS
aa SS aa SS
Ian>sed 's/\([a-z]\+\)\+ \([A-Z]\+\)/\2 \1/' a.txt #获取匹配到的参数
SS ab ac SS dwdwd SS
SS aa aa SS
Ian>text="hello"
Ian>echo "hello world"|sed "s/$text/hhh/g"
hhh world

5.awk高级文本处理

基本使用：

awk "BEGIN { begin statements } pattern { commands } END { end statements }"

Ian>awk "BEGIN { i=0 } { i++ } END { print i }" b.txt 
3

print参数以逗号分隔，打印时按照空格分隔；print中的双引号为连接符用

an>echo | awk "{ v1=\"v1\"; v2=\"v2\"; print v1,v2;}"
v1 v2
Ian>echo | awk "{ v1=\"v1\"; v2=\"v2\"; print v1 \"-\" v2;}"
v1-v2

工作流程：

1.执行BEGIN中的命令
2.执行pattern {command}中的命令，按行执行
3.执行END中的命令
特殊变量

NR：行号
NF：当前行的字段数
$0：当前行的内容
$1：当前行的第一个字段
$2：当前行的第二个字段
- 特殊变量过滤
  NR<5：行号小于5
  NR=.=1；NR==4：1到4行
  ‘/linux/’：包含模式linux的行
  ‘!/linux/’：不包含模式linux的行
- 举例

Ian>echo -e "line1 f2 f3\nlin2 f4 f5" | awk '{print "Line no:"NR"  Line field:"NF"  $0="$0"  $1="$1"  $2="$2 "  $(NF-1)=" $(NF-1)}'
Line no:1  Line field:3  $0=line1 f2 f3  $1=line1  $2=f2  $(NF-1)=f2
Line no:2  Line field:3  $0=lin2 f4 f5  $1=lin2  $2=f4  $(NF-1)=f4
Ian>echo -e "line1 f2 f3\nlin2 f4 f5" | awk ' NR==1 { print $2 }'#只打印第一行
f2
Ian>awk "END { print NR }" out.html #统计行数
54
Ian>seq 3 | awk ' BEGIN { sum=0; print sum } { print ; sum=sum+$1 } END { print sum }'#加和
0
1
2
3
6
Ian>seq 3 | awk ' BEGIN { sum=0; print sum } NR==1,NR==2 { print ; sum=sum+$1 } END { print sum }'只对1-2行加和
0
1
2
3
Ian>seq 3 | awk ' BEGIN { sum=0; print sum } /2/ { print ; sum=sum+$1 } END { print sum }'#只对含有2的行执行
0
2
2
Ian>seq 3 | awk ' BEGIN { sum=0; print sum } !/2/ { print ; sum=sum+$1 } END { print sum }'#只对不含有2的行执行
0
1
3
4
Ian>echo -e "line1 f2 f3\nlin2 f4 f5" | awk '  { if(NR%2==1) print $2; }'#取余
f2

高级用法
- 将外部变量传给awk
- getline读取行
- 使用循环
- 内建字符串函数：length(str),index(str,search_str),split,substr,sub,gsub,match等函数

Ian>v1=1111
Ian>v2=2222
Ian>echo | awk " { print v1,v2 }" v1=$v1,v2=$v2#传入外部变量
1111,v2=2222
Ian>seq 3|awk 'BEGIN { getline; print "getline:"$0 } { print ; }'#获取一行
getline:1
2
3
Ian>echo |awk '{ "grep root /etc/passwd" |getline output; print output }'#使用getline将数据输出传给output变量
root:x:0:0:root:/root:/bin/bash
Ian>echo |awk '{ for(i=0;i<3;i++) print i; }'#使用for循环
0
1
2

6.按列合并文件（paste）

paste fille1 file2 -d “,”：按列合并两个文件，用","当做分隔符

Ian>cat f1 
1
2
3
Ian>cat f2
4
5
6
7
Ian>paste f1 f2 -d "."
1.4
2.5
3.6
.7

7.打印两个pattern之间的文本

awk ’ /start_pattern/, /end_pattern/’ file

Ian>cat f1 
1
2
3
Ian>awk '/1/,/2/' f1
1
2

8.倒序打印文本

tac命令
tac是反过来的cat

Ian>cat f1
1
2
3
Ian>tac f1
3
2
1

使用awk实现栈

Ian>awk '{ stack[NR]=$0 } END { for(i=NR;i>0;i--) print stack[i]; }' f1
3
2
1

9.替换目录中所有文件的文本

Ian>cat test.txt 
a s d f
Ian>find ./ -name *.txt -print0| xargs -I{} -0 sed -i "s/a/b/g" {}
Ian>cat test.txt 
b s d f

IanWatson

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
linux shell脚本攻略第四章让文本飞 grep，cut，sed，awk，paste

目录1.正则表达式2.使用grep搜索文本3.使用cut按列切分文件4.使用sed进行替换5.awk高级文本处理1.正则表达式正则表达式手册：https://tool.oschina.net/uploads/apidocs/jquery/regexp.html举例匹配邮箱：[a-z0-9_]+@[a-z0-9]+.[a-z]+匹配所有单词：( ?a-zA-z ?)匹配ip：[0-9...
复制链接

扫一扫