文本处理三剑客之grep详解

前言

面试中常常问到三剑客,写篇博客总结一下常见用法

grep

用法格式:
总览 SYNOPSIS
       grep [options] PATTERN [FILE...]
       grep [options] [-e PATTERN | -f FILE] [FILE...]

grep 搜索以 FILE 命名的文件输入 (或者是标准输入,如果没有指定文件名,或者给出的文件名是 - 的话),寻找含有与给定的模式 PATTERN 相匹配的内容的行。 默认情况下,grep 将把含有匹配内容的行打印出来
egrep相当于grep -F fgrep相当于grep -E

options
选项含义
-A NUM, --after-context=NUM打印出紧随匹配的行之后的下文 NUM 行。在相邻的匹配组之间将会打印内容是 – 的一行
cat hello.txt
sfasdefasedf
dsdqw                      
sd
fdggggede
edww233
444ff5yh
4
fggred554
3
dd
grep -A 2 'gggg' hello.txt 
fdggggede
edww233
444ff5yh
选项含义
-a, --text将一个二进制文件视为一个文本文件来处理;它与 --binary-files=text 选项等价
zhq@manjaro test1 cp /usr/bin/cat ./a.txt
zhq@manjaro test1 grep 'Arch' a.txt    
grep: a.txt: binary file matches
选项含义
-B NUM, --before-context=NUM打印出匹配的行之前的上文 NUM 行。在相邻的匹配组之间将会打印内容是 – 的一行。
zhq@manjaro test1 grep -B 2 'gggg' hello.txt
dsdqw                      
sd
fdggggede
选项含义
-C NUM, --context=NUM打印出匹配的行的上下文前后各 NUM 行。在相邻的匹配组之间将会打印内容是 – 的一行
zhq@manjaro test1 grep -C 2 'gggg' hello.txt
dsdqw                      
sd
fdggggede
edww233
444ff5yh
选项含义
-b, --byte-offset在输出的每行前面同时打印出当前行在输入文件中的字节偏移量
zhq@manjaro test1 cat hello.txt          
sfasdefasedf
dsdqw                      
sd
fdggggede
edww233
444ff5yh
4
fggred554
3
dd
zhq@manjaro test1 grep -b  'dsd' hello.txt
13:dsdqw 
# 算上/n  刚好13个字节
选项含义
-c, --count禁止通常的输出;作为替代,为每一个输入文件打印一个匹配的行的总数。如果使用 -v, --invert-match 选项 (参见下面),将是不匹配的行的总数
zhq@manjaro test1 grep -c '[0-4]' b.txt                    
10
zhq@manjaro test1 cat b.txt
127.0.0.2 name2
127.0.0.2 name2
127.0.0.2 name2
127.0.0.2 name2
127.0.0.2 name2
127.0.0.2 name2
127.0.0.2 name2
127.0.0.2 name2
127.0.0.2 name2
127.0.0.2 name2
选项含义
-e PATTERN, --regexp=PATTERN使用模式 PATTERN 作为模式;在保护以 - 为起始的模式时有用
zhq@manjaro test1 grep -e '^ed' hello.txt
edww233
15:07:58 zhq@manjaro test1 cat hello.txt 
sfasdefasedf
dsdqw                      
sd
fdggggede
edww233
444ff5yh
4
fggred554
3
选项含义
-n, --line-number在输出的每行前面加上它所在的文件中它的行号。
zhq@manjaro test1 cat hello.txt | grep -n 'ds'
2:dsdqw   
选项含义
-m NUM, --max-count=NUM在找到 NUM 个匹配的行之后,不再读这个文件。
zhq@manjaro test1 cat hello.txt | grep  -m 4 's' 
sfasdefasedf
dsdqw                      
sd
选项含义
-o, --only-matching只显示匹配的行中与 PATTERN 相匹配的部分
zhq@manjaro test1 cat hello.txt | grep  -o 's'  
s
s
s
s
s
zhq@manjaro test1 cat hello.txt | grep  -o 's' | wc -l
5
选项含义
-R, -r, --recursive递归地读每一目录下的所有文件。这样做和 -d recurse 选项等价
zhq@manjaro test1 sudo grep -rne 'pacman' /etc/*.conf
/etc/healthd.conf:16:# N.B.: If you choose to use the beep command, you'll need to install it: pacman -S beep
/etc/logrotate.conf:20:# Ignore pacman saved files
/etc/pacman.conf:2:# /etc/pacman.conf
/etc/pacman.conf:4:# See the pacman.conf(5) manpage for option and repository directives
/etc/pacman.conf:13:#DBPath      = /var/lib/pacman/
/etc/pacman.conf:14:CacheDir = /var/cache/pacman/pkg/
/etc/pacman.conf:15:#LogFile     = /var/log/pacman.log
/etc/pacman.conf:16:#GPGDir      = /etc/pacman.d/gnupg/
/etc/pacman.conf:17:#HookDir     = /etc/pacman.d/hooks/
/etc/pacman.conf:18:HoldPkg      = pacman glibc manjaro-system
/etc/pacman.conf:42:# By default, pacman accepts packages signed by keys that its local keyring
/etc/pacman.conf:43:# trusts (see pacman-key and its man page), as well as unsigned packages.
/etc/pacman.conf:48:# NOTE: You must run `pacman-key --init` before first using pacman; the local
/etc/pacman.conf:50:# packagers with `pacman-key --populate archlinux manjaro`.
/etc/pacman.conf:55:#   - pacman will search repositories in the order defined here
/etc/pacman.conf:77:Include = /etc/pacman.d/mirrorlist
/etc/pacman.conf:81:Include = /etc/pacman.d/mirrorlist
/etc/pacman.conf:85:Include = /etc/pacman.d/mirrorlist
/etc/pacman.conf:92:Include = /etc/pacman.d/mirrorlist
/etc/pacman.conf:94:# An example of a custom package repository.  See the pacman manpage for
/etc/pacman-mirrors.conf:2:## /etc/pacman-mirrors.conf
选项含义
-v, --invert-match改变匹配的意义,只选择不匹配的行
zhq@manjaro test1 grep -nv 'ds' hello.txt 
1:sfasdefasedf
3:sd
4:fdggggede
5:edww233
6:444ff5yh
7:4
8:fggred554
9:3
10:dd
zhq@manjaro test1 grep -n 'ds' hello.txt 
2:dsdqw
选项含义
-i, --ignore-case忽略模式 PATTERN 和输入文件中的大小写的分别
zhq@manjaro test1 grep -ni 'asd' hello.txt
1:sfasdefasedf
5:edASDEF
选项含义
-w, --word-regexp只选择含有能组成完整的词的匹配的行。词的组成字符是字母,数字,还有下划线
zhq@manjaro test1 cat hello.txt                        
hello world
Generally speaking, long holidays are 
good for us college students. On the one hand, we 
have a lot of time to study by ourselves and thus 
improve weaknesses and further develop 
strengths. On the o
ther hand, we can take part-time jobs, which can make us realize responsibility 
and make ourselves better prepared for social life.
  But every coin has two sides. Some students fail to make good use 
of their time and they 
are addicted to various computer games. I am afraid that they are likely to 
ruin themselves in this way
16:25:52 zhq@manjaro test1 cat hello.txt| grep -w long 
Generally speaking, long holidays are 
选项含义
-l, --files-with-matches禁止通常的输出;作为替代,打印出每个在通常情况下会产生输出的输入文件的名字。对每个文件的扫描在遇到第一个匹配的时候就会停止

大写L是除去符合的文件

zhq@manjaro ~ sudo grep -l "pacman" /var/log/*.log                                                                                                            
/var/log/pacman.log
选项含义
–include=PATTERN仅仅在搜索匹配 PATTERN 的文件时在目录中递归搜索。
–exclude=PATTERN在目录中递归搜索,但是跳过匹配 PATTERN 的文件。

grep案例(持续更新)

  • 1, 在src目录下所有.c和.php文件中查找main()的个数。
17:23:18 zhq@manjaro test1 ls
a.txt  b.txt  file  hello.txt  src
17:23:19 zhq@manjaro test1 grep -nr 'main()' src/*.c src/*.php
src/a.c:3:int main(){
src/a.php:2:  function main(){
  • 2 , 将/etc/passwd 文件中没有出现 root 和 nologin 的行取出来
17:25:30 zhq@manjaro test1 egrep -v 'root|nologin' /etc/passwd
git:x:970:970:git daemon user:/:/usr/bin/git-shell
ntp:x:87:87:Network Time Protocol:/var/lib/ntp:/bin/false
zhq:x:1000:1000:zhq:/home/zhq:/usr/bin/zsh
gitlab:x:105:105::/var/lib/gitlab:/usr/share/webapps/gitlab-shell/bin/gitlab-shell
postgres:x:959:959:PostgreSQL user:/var/lib/postgres:/bin/bash
  • 3 , 统计某文件中末尾带o的单词个数(默认英文单词,即字符串两头带空格)
18:03:56 zhq@manjaro test1 cat hello.txt | grep -oE  '\<[a-z]*o\>'   
hello
to
o
two
to
to
to
18:03:47 zhq@manjaro test1 cat hello.txt | grep -oE  '\<[a-z]*o\>' | wc -l   
7
18:04:01 zhq@manjaro test1 cat hello.txt | grep -oE  '\<[a-z]*o\>' 
hello world
Generally speaking, long holidays are 
good for us college students. On the one hand, we 
have a lot of time to study by ourselves and thus 
improve weaknesses and further develop 
strengths. On the o
ther hand, we can take part-time jobs, which can make us realize responsibility 
and make ourselves better prepared for social life.
  But every coin has two sides. Some students fail to make good use 
of their time and they 
are addicted to various computer games. I am afraid that they are likely to 
ruin themselves in this way
  • 4 取出文本中0-255之间的数,并统计个数
    我有两种思路,如果大家有别的想法,可以评论区留言
ifconfig| egrep -o '[1-9]|[0-9][0-9]|[1-2][0-5][0-5]'
ifconfig | grep -o '[1-9]*'|awk '$1<255{print $1}' 
  • 5 过滤文本中的IP
ifconfig| grep -Eo "([0-9]{1,3}\.){3}[0-9]{1,3}"
  • 6 输出/var/log下存在IP的.log文件
sudo grep -El "([0-9]{1,3}\.){3}[0-9]{1,3}" /var/log/*.log 
  • 7 统计文本中不以字符空格开头的行
 grep  '^[^a-zA-Z[:space:]]'  hello.txt
  • 8 匹配文本中非#开头并且非空行
hq@manjaro test1 grep -Ev '^#|^$' hello.txt                                                                                                                130 ↵
hello world
good for us college students. On the one hand, we 
improve weaknesses and further develop 
strengths. On the o
cd3
cdd3
ther hand, we can take part-time jobs, which can make us realize responsibility 
and make ourselves better prepared for social life.
  But every coin has two sides. Some students fail to make good use 
of their time and they 
are addicted to various computer games. I am afraid that they are likely to 
ruin themselves in this way
88888
$$$$$
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值