文本处理三剑客之grep详解

最新推荐文章于 2023-10-07 19:09:39 发布

w1n0

最新推荐文章于 2023-10-07 19:09:39 发布

阅读量396

点赞数

分类专栏： shell 文章标签： shell

本文链接：https://blog.csdn.net/weixin_43414025/article/details/115048317

版权

shell 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

前言

面试中常常问到三剑客，写篇博客总结一下常见用法

grep

用法格式：

总览 SYNOPSIS
       grep [options] PATTERN [FILE...]
       grep [options] [-e PATTERN | -f FILE] [FILE...]

grep 搜索以 FILE 命名的文件输入 (或者是标准输入，如果没有指定文件名，或者给出的文件名是 - 的话)，寻找含有与给定的模式 PATTERN 相匹配的内容的行。默认情况下，grep 将把含有匹配内容的行打印出来
egrep相当于grep -F fgrep相当于grep -E

options

选项	含义
-A NUM, --after-context=NUM	打印出紧随匹配的行之后的下文 NUM 行。在相邻的匹配组之间将会打印内容是 – 的一行

cat hello.txt
sfasdefasedf
dsdqw                      
sd
fdggggede
edww233
444ff5yh
4
fggred554
3
dd
grep -A 2 'gggg' hello.txt 
fdggggede
edww233
444ff5yh

选项	含义
-a, --text	将一个二进制文件视为一个文本文件来处理；它与 --binary-files=text 选项等价

zhq@manjaro test1 cp /usr/bin/cat ./a.txt
zhq@manjaro test1 grep 'Arch' a.txt    
grep: a.txt: binary file matches

选项	含义
-B NUM, --before-context=NUM	打印出匹配的行之前的上文 NUM 行。在相邻的匹配组之间将会打印内容是 – 的一行。

zhq@manjaro test1 grep -B 2 'gggg' hello.txt
dsdqw                      
sd
fdggggede

选项	含义
-C NUM, --context=NUM	打印出匹配的行的上下文前后各 NUM 行。在相邻的匹配组之间将会打印内容是 – 的一行

zhq@manjaro test1 grep -C 2 'gggg' hello.txt
dsdqw                      
sd
fdggggede
edww233
444ff5yh

选项	含义
-b, --byte-offset	在输出的每行前面同时打印出当前行在输入文件中的字节偏移量

zhq@manjaro test1 cat hello.txt          
sfasdefasedf
dsdqw                      
sd
fdggggede
edww233
444ff5yh
4
fggred554
3
dd
zhq@manjaro test1 grep -b  'dsd' hello.txt
13:dsdqw 
# 算上/n  刚好13个字节

选项	含义
-c, --count	禁止通常的输出；作为替代，为每一个输入文件打印一个匹配的行的总数。如果使用 -v, --invert-match 选项 (参见下面)，将是不匹配的行的总数

zhq@manjaro test1 grep -c '[0-4]' b.txt                    
10
zhq@manjaro test1 cat b.txt
127.0.0.2 name2
127.0.0.2 name2
127.0.0.2 name2
127.0.0.2 name2
127.0.0.2 name2
127.0.0.2 name2
127.0.0.2 name2
127.0.0.2 name2
127.0.0.2 name2
127.0.0.2 name2

选项	含义
-e PATTERN, --regexp=PATTERN	使用模式 PATTERN 作为模式；在保护以 - 为起始的模式时有用

zhq@manjaro test1 grep -e '^ed' hello.txt
edww233
15:07:58 zhq@manjaro test1 cat hello.txt 
sfasdefasedf
dsdqw                      
sd
fdggggede
edww233
444ff5yh
4
fggred554
3

选项	含义
-n, --line-number	在输出的每行前面加上它所在的文件中它的行号。

zhq@manjaro test1 cat hello.txt | grep -n 'ds'
2:dsdqw

选项	含义
-m NUM, --max-count=NUM	在找到 NUM 个匹配的行之后，不再读这个文件。

zhq@manjaro test1 cat hello.txt | grep  -m 4 's' 
sfasdefasedf
dsdqw                      
sd

选项	含义
-o, --only-matching	只显示匹配的行中与 PATTERN 相匹配的部分

zhq@manjaro test1 cat hello.txt | grep  -o 's'  
s
s
s
s
s
zhq@manjaro test1 cat hello.txt | grep  -o 's' | wc -l
5

选项	含义
-R, -r, --recursive	递归地读每一目录下的所有文件。这样做和 -d recurse 选项等价

zhq@manjaro test1 sudo grep -rne 'pacman' /etc/*.conf
/etc/healthd.conf:16:# N.B.: If you choose to use the beep command, you'll need to install it: pacman -S beep
/etc/logrotate.conf:20:# Ignore pacman saved files
/etc/pacman.conf:2:# /etc/pacman.conf
/etc/pacman.conf:4:# See the pacman.conf(5) manpage for option and repository directives
/etc/pacman.conf:13:#DBPath      = /var/lib/pacman/
/etc/pacman.conf:14:CacheDir = /var/cache/pacman/pkg/
/etc/pacman.conf:15:#LogFile     = /var/log/pacman.log
/etc/pacman.conf:16:#GPGDir      = /etc/pacman.d/gnupg/
/etc/pacman.conf:17:#HookDir     = /etc/pacman.d/hooks/
/etc/pacman.conf:18:HoldPkg      = pacman glibc manjaro-system
/etc/pacman.conf:42:# By default, pacman accepts packages signed by keys that its local keyring
/etc/pacman.conf:43:# trusts (see pacman-key and its man page), as well as unsigned packages.
/etc/pacman.conf:48:# NOTE: You must run `pacman-key --init` before first using pacman; the local
/etc/pacman.conf:50:# packagers with `pacman-key --populate archlinux manjaro`.
/etc/pacman.conf:55:#   - pacman will search repositories in the order defined here
/etc/pacman.conf:77:Include = /etc/pacman.d/mirrorlist
/etc/pacman.conf:81:Include = /etc/pacman.d/mirrorlist
/etc/pacman.conf:85:Include = /etc/pacman.d/mirrorlist
/etc/pacman.conf:92:Include = /etc/pacman.d/mirrorlist
/etc/pacman.conf:94:# An example of a custom package repository.  See the pacman manpage for
/etc/pacman-mirrors.conf:2:## /etc/pacman-mirrors.conf

选项	含义
-v, --invert-match	改变匹配的意义，只选择不匹配的行

zhq@manjaro test1 grep -nv 'ds' hello.txt 
1:sfasdefasedf
3:sd
4:fdggggede
5:edww233
6:444ff5yh
7:4
8:fggred554
9:3
10:dd
zhq@manjaro test1 grep -n 'ds' hello.txt 
2:dsdqw

选项	含义
-i, --ignore-case	忽略模式 PATTERN 和输入文件中的大小写的分别

zhq@manjaro test1 grep -ni 'asd' hello.txt
1:sfasdefasedf
5:edASDEF

选项	含义
-w, --word-regexp	只选择含有能组成完整的词的匹配的行。词的组成字符是字母，数字，还有下划线

zhq@manjaro test1 cat hello.txt                        
hello world
Generally speaking, long holidays are 
good for us college students. On the one hand, we 
have a lot of time to study by ourselves and thus 
improve weaknesses and further develop 
strengths. On the o
ther hand, we can take part-time jobs, which can make us realize responsibility 
and make ourselves better prepared for social life.
　　But every coin has two sides. Some students fail to make good use 
of their time and they 
are addicted to various computer games. I am afraid that they are likely to 
ruin themselves in this way
16:25:52 zhq@manjaro test1 cat hello.txt| grep -w long 
Generally speaking, long holidays are

选项	含义
-l, --files-with-matches	禁止通常的输出；作为替代，打印出每个在通常情况下会产生输出的输入文件的名字。对每个文件的扫描在遇到第一个匹配的时候就会停止

大写L是除去符合的文件

zhq@manjaro ~ sudo grep -l "pacman" /var/log/*.log                                                                                                            
/var/log/pacman.log

选项	含义
–include=PATTERN	仅仅在搜索匹配 PATTERN 的文件时在目录中递归搜索。
–exclude=PATTERN	在目录中递归搜索，但是跳过匹配 PATTERN 的文件。

grep案例(持续更新)

1，在src目录下所有.c和.php文件中查找main()的个数。

17:23:18 zhq@manjaro test1 ls
a.txt  b.txt  file  hello.txt  src
17:23:19 zhq@manjaro test1 grep -nr 'main()' src/*.c src/*.php
src/a.c:3:int main(){
src/a.php:2:  function main(){

2 ，将/etc/passwd 文件中没有出现 root 和 nologin 的行取出来

17:25:30 zhq@manjaro test1 egrep -v 'root|nologin' /etc/passwd
git:x:970:970:git daemon user:/:/usr/bin/git-shell
ntp:x:87:87:Network Time Protocol:/var/lib/ntp:/bin/false
zhq:x:1000:1000:zhq:/home/zhq:/usr/bin/zsh
gitlab:x:105:105::/var/lib/gitlab:/usr/share/webapps/gitlab-shell/bin/gitlab-shell
postgres:x:959:959:PostgreSQL user:/var/lib/postgres:/bin/bash

3 ，统计某文件中末尾带o的单词个数(默认英文单词，即字符串两头带空格)

18:03:56 zhq@manjaro test1 cat hello.txt | grep -oE  '\<[a-z]*o\>'   
hello
to
o
two
to
to
to
18:03:47 zhq@manjaro test1 cat hello.txt | grep -oE  '\<[a-z]*o\>' | wc -l   
7
18:04:01 zhq@manjaro test1 cat hello.txt | grep -oE  '\<[a-z]*o\>' 
hello world
Generally speaking, long holidays are 
good for us college students. On the one hand, we 
have a lot of time to study by ourselves and thus 
improve weaknesses and further develop 
strengths. On the o
ther hand, we can take part-time jobs, which can make us realize responsibility 
and make ourselves better prepared for social life.
　　But every coin has two sides. Some students fail to make good use 
of their time and they 
are addicted to various computer games. I am afraid that they are likely to 
ruin themselves in this way

4 取出文本中0-255之间的数，并统计个数
我有两种思路，如果大家有别的想法，可以评论区留言

ifconfig| egrep -o '[1-9]|[0-9][0-9]|[1-2][0-5][0-5]'

ifconfig | grep -o '[1-9]*'|awk '$1<255{print $1}'

5 过滤文本中的IP

ifconfig| grep -Eo "([0-9]{1,3}\.){3}[0-9]{1,3}"

6 输出/var/log下存在IP的.log文件

sudo grep -El "([0-9]{1,3}\.){3}[0-9]{1,3}" /var/log/*.log

7 统计文本中不以字符空格开头的行

 grep  '^[^a-zA-Z[:space:]]'  hello.txt

8 匹配文本中非#开头并且非空行

hq@manjaro test1 grep -Ev '^#|^$' hello.txt                                                                                                                130 ↵
hello world
good for us college students. On the one hand, we 
improve weaknesses and further develop 
strengths. On the o
cd3
cdd3
ther hand, we can take part-time jobs, which can make us realize responsibility 
and make ourselves better prepared for social life.
　　But every coin has two sides. Some students fail to make good use 
of their time and they 
are addicted to various computer games. I am afraid that they are likely to 
ruin themselves in this way
88888
$$$$$