grep命令

最新推荐文章于 2023-03-23 15:13:01 发布

pluto_peach

最新推荐文章于 2023-03-23 15:13:01 发布

阅读量216

点赞数

分类专栏： linux

本文链接：https://blog.csdn.net/weixin_45415743/article/details/108039017

版权

linux 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

grep搜索文本

在文件中搜索一个单词

 $ grep match_pattern filename
 或者
 $ grep "match_pattern" filename 返回包含match_pattern的文本行
 或者从stdin中读取
 $ echo -e "this is a word \nnext line" | grep word
 也可以多文件搜索
 $ grep "match_text" file1 file2 file3 ...
 用--color选项可以在输出行重点标记匹配到的单词：
 $ grep word filename --color=auto

正则用法

$ grep -E "[a-z]+"
 或者
 $ egrep "[a-z]+"

只输出文件中匹配到的文本部分，可以使用-o:

 $ echo this is a line. | grep	-o -E "[a-z]+\."

打印除包含match_pattern的行之外的所有行(-v 可以将匹配进行反转invent）

 $ grep -v match_pattern file

统计文件或文本中包含的匹配字符串行数：(只是匹配行的数量，不是匹配的次数）

 $ grep -c "text" filename

匹配统计匹配项的数量可以：

 $ echo -e "1 2 3 4\nhello\n5 6" | egrep -o "[0-9]" | wc -l

打印包含匹配字符串的行数

 $ cat sample1.txt
 $ grep linux -n sample1.txt
 2：linux is fun 
 或者
 $ cat sample1.txt | grep linux -n
 如果使用了多个文件，也会随之打印出文件名
 $ grep linux -n sample1.txt sample2.txt

打印样式匹配所位于的字符或字节偏移

 $ echo gnu is not unix | grep -b -o "not"
 7:not

搜索多个文件并找出匹配文本位于哪一个文件中


 $ grep -l linux sample1.txt sample2.txt 
 与-l相反的是-L ,会返回一个不匹配的文件列表

更多用法


 1.递归搜索文件（多级目录中对文本进行递归搜索）
 $ grep "text" . -R -n
 
 2.忽略样式中的大小写
 选项-i可以忽略大小写
 $ echo hello world | grep -i "HELLO"
 
 3.用grep匹配多个样式
 使用-e 匹配多个样式
 $ grep -e "pattern1" -e "pattern"
 例如：
 $ echo this is a line of text | grep -e "this" -e "line" -o
 或者使用-f执行grep
 $ cat pat_file   (pat_file => hello cool
 $ echo hello this is cool | grep -f pat_file 
 
 4.在grep搜索中包括或排除文件
 递归搜索所有的.c和.cpp文件
 $ grep "main()" . -r --include *.{c,cpp} 
 搜索中排除所有的README文件
 $ grep "main()" . -r --exclude "README"
 排除目录可以用 --exclude-dir1
 读取所需排除的文件列表，使用--exclude-from FILE
 
 5.使用0字节后缀的grep与xargs
 $ grep "test" file* -lz | xargs -0 rm
 
 6.grep静默输出
 测试文本匹配是否存在于某个文件中
 #!/bin/bash
 
 if [$# -ne 2];
 then
 echo "$0 match_text filename"
 fi
 
 match_text=$1
 filename=$2
 
 grep -q $match_text $filename
 if [$? -eq 0];
 then 
 echo "The text exists in the file"
 else
 echo  "The text does not exists in the file"
 fi
 
 $ ./silent_grep.sh Student student_data.txt
 
 7.打印出匹配文本之前或之后的行
 打印匹配某个结果之后的三行，使用-A:
 $ seq 10 | grep 5 -A 4
 
 打印匹配某个结果之前的三行，使用-B
 $ seq 10 | grep 5 -B 3
 
 某个结果之前以及之后的三行，使用-C
 $ seq 10 | grep 5 -C 3
 
 如果有多行，那么以一行“--”作为和匹配之间的定界符
 $ echo -e "a\nb\nc\na\nb\nc" | grep a -A 1
 a
 b
 --
 a 
 b

用cut按列切分文件

提取第一个字段或列

 cut -f field_list filename （field_list是需要显示的列，他由列号组成，彼此之间用逗号分隔。）
 $ cut -f 2,3 filename 
 制表符是字段默认定界符，没有定界符的行会被照原样打印出来，避免不好韩定界符的行可以使用cut的-s选项
 提取一个字段
 $ cut -f1 student_data.txt
 提取多个字段
 $cut -f2,4 student_data.txt
 打印除了该列的所有列可以用--complement
 $ cut -f3 --complement student_data.txt

 指定字段的定界符，使用-d
$ cut -f2 -d";" delimited_data.txt
 指定字段的字符或者字节范围
 N- 从第N个字节，字符或字段到行尾
 N-M 从第N个字节，字符或字段到第M个字节，字符或字段
 -M 第一个字节，字符或字段到第M个字节、字符或字段
 选项：
 -b 表示字节
 -c 表示字符
 -f 表示定义字段

 打印第一个到第五个字符
 $ cutt -c1-5 filename
 打印前两个字符
 $ cut  filename -c-2
 提取多个字符时，必须使用--output-delimiter区分字段
 $ cut filename -c1-3,6-9 --output-delimiter ","

统计特定文件中的词频

#!/bin/bash
if [$# -ne 1];
then 
echo "Usage: $0 filename";
exit -1
fi

filename=$1
egrep -o "\b[[:alpha:]]+\b" $filename | \

awk '{ count[$0]++}
END{ printf{"%-14s%s\b","word","Count");
for (ind in count)
{printf{"%-14s%d\n",ind,count[ind]);}
}'