[Linux]知其然且知所以然之grep命令

最近听一新手朋友讲刚进公司的感受,其中一点就是Linux下的命令用得不熟,一大堆参数完全记不住,对一些老鸟的指法无比羡慕嫉妒恨。

刚开始使用命令谁都不熟,而且不经常使用还会忘记。刚敲入一个命令却忘记了参数是非常恼人的,尤其是遇到一个棘手的问题亟待解决时。这时你会怎么办呢?打开浏览器查一下?还是问下旁边的同事?前者有点慢,毕竟你需要从大量的信息中搜索,如碰上像华为这样把外网完全屏弊的公司,这个办法跟本不可行;后者呢,你可能打扰到你的同事,还可能遭到同事口头或内心的鄙视,呵呵。这里分享一个我常用的办法,查看man手册。

初次学习一个命令,先到网上简单了解一下,然后对照着man手册看,不懂的再上网查,回来接着看man手册,直到把man手册看懂为止。为什么要这样?一,man手册中有的描述看完还是不懂;二,英语太烂。等你看懂了man手册,下次再忘记了,直接查看man手册,那速度快得很。

公司有一女同事,命令用的很熟,各种参数一大堆,稀里哗啦往上敲。有一次见她用了一个很陌生的参数,遂问之,答曰:我不懂,反正就是这么用。个人不喜欢这种方式,我更喜欢知其然且知其所以然,因为我有一颗要当大牛的心。

这篇文章我想全面总结下grep命令的用法。grep太重要了,工作中能让你的工作更有效率,更重要的是可以让你在面试官面前更有自信。这里插一句,有一次我们在组织一次代码review的会议,我问道:要不要请一些其他组的同事参与?领导想了下说:恩,以前需要,现在不需要了,因为专家都在咱们组。确实如此,我们一个小组共5个人,其中就有两个架构师,review代码绰绰有余了,再叫外人,谁敢在专家面前卖弄呢。哦,天呐,我究竟想说什么呢?我想说,你是否也想成为专家呢?废话少说,开干!

一、究竟什么是grep

假如别人问你:“什么是grep?”你会怎么回答呢?Linux下的命令;Linux下的搜索命令;这两个回答都够烂的,当然更烂的回答是:I don't know。不得不说一句,如此这般问问题的方式也够烂的。

究竟什么是grep呢?首先把我使用的grep版本亮出来:

[Linux]知其然且知所以然之grep命令

再多句嘴,一般Linux的命令都支持 -V选项,用来查看版本信息。

好的,看下man手册怎么介绍的吧:

[Linux]知其然且知所以然之grep命令

grep逐行检索输入文件,如该行与设定的模式相匹配,则打印这一行。这里简单的把一行数据看成一个字符串,如果这个字符串的任意子串与模式相匹配,那么这个字符串,也就是这一行都会被打印出来。grep还有两个小弟:egrep,fgrep,后面再提吧。

关于grep的发音,有人发“ge rui pu”,有人叫“ge ru pu”,我习惯叫四个字母“g-r-e-p”。不管是人语还是鸟语,能交流就是好语。

二、grep命令及用法详解

为了方便测试,做了一个文件:

horen@heart> cat testfile.txt
line1: What's that smell?
line2: What's that noise?
line3: What is this line for?
line4: What are you up to?
line5: May I ask you a question?
line6: What does "drowsy" mean?
line7: What's this?
line8: What's that?
line9: Who does this belong to?
line10: Which one?

2.1 -A NUM, --after-context=NUM

-A NUM, --after-context=NUM
Print NUM lines of trailing context after matching lines. Places a line containing -- between contiguous groups of matches.

我们把匹配成功的行叫做匹配行。-A 参数后加个数字NUM, 就可以在显示匹配行后再显示NUM行。匹配行+指定的NUM行称做一组,相邻组中间由两个减号隔开,事实上只是相邻组如果中间有被过滤掉的行才会显示两个减号,用于表示两组中间还有一部分行没显示。(注,相邻匹配行之间不足NUM行,则构不成一组。)

horen@heart> grep -A 2 this testfile.txt
line3: What is this line for?
line4: What are you up to?
line5: May I ask you a question?
--
line7: What's this?
line8: What's that?
line9: Who does this belong to?
line10: Which one?

2.2 -B NUM, --before-context=NUM

       -B NUM, --before-context=NUM
              Print NUM lines of leading context before matching lines.  Places a line containing -- between contiguous groups of matches.

与-A参数类似,显示匹配行之前的NUM行。有时我们不仅关心匹配行,也关心匹配行周围的行,就可以使用这两个参数。

horen@heart> grep -B 2 this  testfile.txt
line1: What's that smell?
line2: What's that noise?
line3: What is this line for?
--
line5: May I ask you a question?
line6: What does "drowsy" mean?
line7: What's this?
line8: What's that?
line9: Who does this belong to?
horen@heart> grep -A 2 -B 2 drowsy testfile.txt
line4: What are you up to?
line5: May I ask you a question?
line6: What does "drowsy" mean?
line7: What's this?
line8: What's that?

2.3 -C NUM, --context=NUM

       -C NUM, --context=NUM
              Print NUM lines of output context.  Places a line containing -- between contiguous groups of matches.

相当于-A -B 的组合,输出匹配行上下各NUM行。

horen@heart> grep -C 2 drowsy testfile.txt
line4: What are you up to?
line5: May I ask you a question?
line6: What does "drowsy" mean?
line7: What's this?
line8: What's that?

2.4 --binary-files=TYPE 遇到二进制文件怎么办?

       --binary-files=TYPE
              If the first few bytes of a file indicate that the file contains binary data, assume that the file
              is  of type TYPE.  By default, TYPE is binary, and grep normally outputs either a one-line message
              saying that a binary file matches, or no message if there is no match.  If TYPE is  without-match,
              grep  assumes  that a binary file does not match; this is equivalent to the -I option.  If TYPE is
              text, grep processes a binary file as if it were text; this is equivalent to the -a option.  Warn-
              ing:  grep  --binary-files=text  might output binary garbage, which can have nasty side effects if
              the output is a terminal and if the terminal driver interprets some of it as commands.

grep在搜索一个文件时,如果发现这个文件是二进制文件,那么grep会根据TYPE的设置有所区别对待。
默认情况下,TYPE为binary,如果有匹配成功的行,grep会打印一个诸如“Binary file XXX matches”的信息。
如果TYPE为without-match,那么grep会忽略二进制文件(这在同时搜索多个文件时或许有用),等同于 grep -a
如果TYPE为text,那么grep会把该二进制文件当做文本文件进行匹配,结果有可能乱七八糟,等同于grep -I

基本没用过这个选项,但有一种情况下或许你会用到,grep不是万能的神,有时它也会把txt文件误当做二进制文件,这时如果匹配成功,grep或许只给你一个“Binary file XXX matches”的信息,而你明明知道这是个txt文件,这时就可以用-a选项,或--binary-files=text.

2.5 -a, --text 把二进制文件当做文本文件处理

       -a, --text
              Process a binary file as if it were text; this is equivalent to the --binary-files=text option.

等同于 --binary-files=text,见2.4。

2.6 -I 忽略二进制文件

       -I     Process  a binary file as if it did not contain matching data; this is equivalent to the --binary-
              files=without-match option.

等同于--binary-files=without-match,见2.4。

2.7 -b, --byte-offset 显示匹配行的偏移量

       -b, --byte-offset
              Print the byte offset within the input file before each line of output.

在匹配行前显示出该行相对于输入文件的偏移量。注意,不是匹配字段相对于该行的偏移量,也不是匹配字段相对于输入文件的偏移量。个人感觉这个没什么太大做用,还不如显示匹配行的行号呢。

[renhongcai@localhost ~]$ grep What -b testfile.txt
0:line1: What's that smell?
26:line2: What's that noise?
52:line3: What is this line for?
82:line4: What are you up to?
142:line6: What does "drowsy" mean?
174:line7: What's this?
194:line8: What's that?

2.7 --colour[=WHEN], --color[=WHEN] 让输出结果更显眼

       --colour[=WHEN], --color[=WHEN]
              Surround the matching string with the marker find in GREP_COLOR environment variable. WHEN may  be
              ‘never’, ‘always’, or ‘auto’

你可曾知道grep的输出结果也可以很炫。通过一些简单的设置,可以让匹配字符显示不同的颜色。
比如,我想让匹配字符显示成生机勃勃的绿色,我设置一个环境即可:export GREP_COLOR='1;32'(至于你想选用其他颜色,请参考文章后面的附录部分),设置完再给grep语句加上--colour=auto选项就可以了。

[renhongcai@localhost ~]$ export GREP_COLOR='1;32'
[renhongcai@localhost ~]$ grep What --colour=auto testfile.txt
line1: What's that smell?
line2: What's that noise?
line3: What is this line for?
line4: What are you up to?
line6: What does "drowsy" mean?
line7: What's this?
line8: What's that?

上面显示可能看不到效果,这个自已试一下就好了。什么?一见钟情?那好吧,这个特殊功能我就送你了。另外再教一手,为了避免每次使用grep都敲上--colour选项,可以在环境变量脚本里加上如下两行:
export GREP_COLOR='1;32'

export GREP_OPTIONS='--color=auto' # auto是自动着色,always是总是着色(我反正不知道它俩有啥不同,你要知道你告诉我~),never是从来不着色。

2.8 -c, --count 只关心匹配成功多少行

       -c, --count
              Suppress normal output; instead print a count of matching lines for each input file.  With the -v,
              --invert-match option (see below), count non-matching lines.

有时我不关心匹配行是什么,我只想知道有多少行匹配成功了,这时就可以用-c选项。注意,前面说到grep默认打印匹配成功的行,但也可以通过选项打印匹配不成功的行(也就是反选),这里-c选项意味着匹配不成功的行数。

[renhongcai@localhost ~]$ grep What -c testfile.txt
7

测试文件里有7行包含了“What”, 所以显示为7。

2.9 -v, --invert-match 只显示不匹配的行,好吧也叫过滤

       -v, --invert-match
              Invert the sense of matching, to select non-matching lines.

只打印不匹配的行,偏偏与默认情况对着干。这个选项还是非常有用的,结合管道可以过滤掉一些不想要的输出,也许你能在常用案例里看到例子,也许你可以提醒我加上。

[renhongcai@localhost ~]$ grep What -v testfile.txt
line5: May I ask you a question?
line9: Who does this belong to?
line10: Which one?

看,测试文件里只有这三行不包含“What”,全给找出来了。

2.10 -D ACTION, --devices=ACTION 设备、套接字也可以搜索

       -D ACTION, --devices=ACTION
              If an input file is a device, FIFO or socket, use ACTION to process it.   By  default,  ACTION  is
              read,  which  means that devices are read just as if they were ordinary files.  If ACTION is skip,
              devices are silently skipped.

如果输入文件是一个套接字、管道或销售队列什么的,grep也可以搜索。ACTION用来标识当遇到这些非常规文件时的处理策略。默认情况下ACTION为read,就像搜索正常的文件一样(我有些担心消息队列的值会不会就让grep给拿走了,其他进程不知情的情况下是不是会认为丢了数据,留待有需要的人去验证吧~)。如果ACTION为skip,那么这些非常规文件就被忽略了。

2.11 -d ACTION, --directories=ACTION

       -d ACTION, --directories=ACTION
              If an input file is a directory, use ACTION to process it.  By  default,  ACTION  is  read,  which
              means  that directories are read just as if they were ordinary files.  If ACTION is skip, directo-
              ries are silently skipped.  If ACTION is recurse, grep  reads  all  files  under  each  directory,
              recursively; this is equivalent to the -r option.

当grep遭遇到目录时,会根据ACTION的不同而采取不同的策略。默认情况下是read,如输入文件是一个目录,那么grep会把目录名当做一个文件,该文件内容为空,其实跟忽略目录一个意思。这里不得不说一句,man手册写的不清楚,“directories are read just as if they were ordinary files”,看完你懂吗?试过才知道怎么回事。言归正转,如果ACTION为skip,则忽略目录,如果ACTION为recurse,则递归搜索目录下的所有文件,等同于-r选项。

2.12 -E, --extended-regexp

       -E, --extended-regexp
              Interpret PATTERN as an extended regular expression (see below).

把PATTERN部分当做扩展正则表达式。正则表达式有好多种,后面可能专门进行整理。

2.13 -e PATTERN, --regexp=PATTERN

       -e PATTERN, --regexp=PATTERN
              Use PATTERN as the pattern; useful to protect patterns beginning with -.

2.14 -F, --fixed-strings 这只是固定的字符串,不是正则表达式

       -F, --fixed-strings
              Interpret  PATTERN  as  a  list  of  fixed  strings,  separated by newlines, any of which is to be
              matched.

这个等价于fgrep。一般情况下,grep会把PATTERN当做正则表达式来处理,但如果PATTERN中含有正则表达式中的符号,使用-F选项则会把这些符号当做普通的字符来处理,也就是$, *, [, |, (, ) 和 \ 等字符串被 fgrep 命令按字面意思解释。

2.15 -P, --perl-regexp 使用perl正则表达式

       -P, --perl-regexp
              Interpret PATTERN as a Perl regular expression.

前面说了,正则表达式有很多种,每种有不同的语法,这个选项用来指定使用perl正则表达式。

2.16 -f FILE, --file=FILE 引入文件中的多上PATTERN

       -f FILE, --file=FILE
              Obtain patterns from FILE, one per line.  The empty file contains  zero  patterns,  and  therefore
              matches nothing.

这个很好理解了,我们或许关心多个关键字,也就是需要多个PATTERN,我们可以把每个PATTERN写入一个文件,每行一个,通过该参数引入即可。

2.17 -G, --basic-regexp 使用基本正则表达式

       -G, --basic-regexp
              Interpret PATTERN as a basic regular expression (see below).  This is the default.

我靠,这是grep支持的第三种正则表达式了吧,事实上这是默认使用的正则表达式。

2.18 -H, --with-filename 输出匹配行时同时输出所属文件名

       -H, --with-filename
              Print the filename for each match.

在打印出来的每个匹配行前打印该行所属的文件名。这在搜索多个文件时特别有用,比如我想搜索多个文件,我想知道哪些文件里包含我所关心的字符,这里就可以使用该选项。有时grep的默认行为也会给我们显示文件名。

[renhongcai@localhost ~]$ grep -H What testfile.txt
testfile.txt:line1: What's that smell?
testfile.txt:line2: What's that noise?
testfile.txt:line3: What is this line for?
testfile.txt:line4: What are you up to?
testfile.txt:line6: What does "drowsy" mean?
testfile.txt:line7: What's this?
testfile.txt:line8: What's that?

2.19 -h, --no-filename 输出匹配行时不输出所属文件名

       -h, --no-filename
              Suppress the prefixing of filenames on output when multiple files are searched.

前面刚说了,搜索多个文件时grep默认行为会打印匹配行的文件名的,-h选项偏偏对着干,强制关闭这种行为。

2.20 -i, --ignore-case 忽略大小写

       -i, --ignore-case
              Ignore case distinctions in both the PATTERN and the input files.

这个很简单但经常用,一定要记住,忽略大小写嘛,没啥可多说的。就在前两天有个朋友问我:grep的-i参数忽略的是PATTERN的大小写还是输入文件的大小写?我反问她一句:你觉得忽略单方面有意义吗?是的,这里是全部忽略大小写。

[renhongcai@localhost ~]$ grep what -i testfile.txt
line1: What's that smell?
line2: What's that noise?
line3: What is this line for?
line4: What are you up to?
line6: What does "drowsy" mean?
line7: What's this?
line8: What's that?

看,如果不加-i,就是严格匹配,没有匹配行,加上-i,出来这么多。再说一句,这个是非常重要的选项,一定要记住。

2.21 -L, --files-without-match 看看哪些文件不包含匹配

       -L, --files-without-match
              Suppress normal output; instead print the name of each input file from which no output would  nor-
              mally have been printed.  The scanning will stop on the first match.

这句英语真纠结,反正意思是:非正常输出,那输出什么呢?输出那些没有匹配成功的文件名,也就是输出那些不包含你所查找字符的文件名。如果一个文件发现一个匹配,那就放弃查找,接着找下个文件。

[renhongcai@localhost ~]$ grep those -L testfile*
testfile2.txt
testfile.txt

2.22 -l, --files-with-matches 只列出包含匹配的文件名

       -l, --files-with-matches
              Suppress normal output; instead print the name of each input file from which output would normally
              have been printed.  The scanning will stop on the first match.

跟上面很相似又相反,相似的是英语很让人纠结,相反的是这个选项只列出那些包含匹配的文件名,跟-H选项不一样,-H选项除列出文件名外还列出匹配行。

[renhongcai@localhost ~]$ grep What -l testfile*
testfile2.txt
testfile.txt

2.23 -m NUM, --max-count=NUM 设定最多匹配行数

       -m NUM, --max-count=NUM
              Stop reading a file after NUM matching lines.  If the input is standard input from a regular file,
              and  NUM  matching  lines  are  output, grep ensures that the standard input is positioned to just
              after the last matching line before exiting, regardless of the presence of trailing context lines.
              This  enables  a calling process to resume a search.  When grep stops after NUM matching lines, it
              outputs any trailing context lines.  When the -c or --count option is also  used,  grep  does  not
              output  a  count  greater than NUM.  When the -v or --invert-match option is also used, grep stops
              after outputting NUM non-matching lines.

这个参数用来设定最多匹配的行数,比如一个文件或许匹配的行数很多很多,而我们只关心前面的一小部分匹配。当已匹配的行数大于设定的NUM数,grep就会停止搜索,尽管后边可能还有很多文件没搜索到。
如果与-c选项合用,(-c干什么的来着?只输出匹配的行数)那么输出结果小于等于NUM。
如果与-v选项合用,(-v干什么的来着?只输出不匹配的行)那么也只输出最多NUM行。

2.24 --mmap 让搜索更快些

       --mmap If possible, use the mmap(2) system call to read input, instead  of  the  default  read(2)  system
              call.   In some situations, --mmap yields better performance.  However, --mmap can cause undefined
              behavior (including core dumps) if an input file shrinks while grep is operating,  or  if  an  I/O
              error occurs.

该选项告诉grep使用性能更高的mmap方式搜索输入文件,具体什么是mmap?暂且知道更高级的方式罢了,问题展开了不便解释。mmap提供了更高的性能,凡是有利必有弊,带来高性能的同时也带了一些危险,如果grep操作过程中,输入文件大小发生了变化(严格来讲是缩小),grep有可能coredump,也有可能导致未定义行为。

2.25 -n, --line-number 打印匹配行的行号

       -n, --line-number
              Prefix each line of output with the line number within its input file.

与前面-H类似,-n在打印匹配行前加上匹配行的行号。

2.26 -o, --only-matching 只显示匹配的部分

       -o, --only-matching
              Show only the part of a matching line that matches PATTERN.

只显示匹配行中与PATTERN相匹配的部分,或许你使用grep搜索过单行非常长的文件,你可怜的终端仿真器无法在一行中显示,一个匹配可能分多行显示,看起来乱乱的。可以尝试使用-o选项。

2.27 --label=LABEL 假定标准输出为某文件

       --label=LABEL
              Displays input actually coming from standard input as input coming from file LABEL.  This is espe-
              cially useful for tools like zgrep, e.g.  gzip -cd foo.gz |grep -H --label=foo something

把来自标准输入的数据假定为某个普通文件,文件名由LABEL指定。在某些情况下,这个命令还算有用。比如gzip -cd foo.gz |grep -H --label=foo something ,gzip将压缩包里的文件解压并全部输出到标准输出,然后通过管道传给grep,由于-label选项的存在,grep会把这些标准输出当做是一个输入文件。这样做的好处是grep搜索了一个压缩文件而没有对该压缩文件造成任何改变,这才是这个选项的重点。

2.28 --line-buffered 及时清空缓冲区

       --line-buffered
              Use line buffering, it can be a performance penality.

每输出一行就清空输出缓冲区,显然这样做需要消耗一定的性能。一般不用。

2.29 -q, --quiet, --silent 不要打印任何东西,我只想知道有没有

       -q, --quiet, --silent
              Quiet; do not write anything to standard output.  Exit immediately with zero status if  any  match
              is found, even if an error was detected.  Also see the -s or --no-messages option.

不向标准输出打印任何东西,如果匹配成功只需立即返回0就是了,另外如果发生错误也不要告诉我。

2.30 -R, -r, --recursive 聪明的递归搜索

       -R, -r, --recursive
              Read all files under each directory, recursively; this is equivalent to the -d recurse option.

         --include=PATTERN
              Recurse in directories only searching file matching PATTERN.

         --exclude=PATTERN
              Recurse in directories skip file matching PATTERN.

与-d选项等价,-r选项告诉grep递归搜索目录下的文件。另外-r选项还有两个可附加的选项:
--include=PATTERN 递归过程中只搜索符合特定的文件,比如只搜索.cpp文件
--exclude=PATTERN 递归过程中忽略某些特定的文件,比如忽略.jpg文件。

2.31 -s, --no-messages 忽略某些错误信息

       -s, --no-messages
              Suppress error messages about nonexistent or unreadable files.  Portability note: unlike GNU grep,
              traditional grep did not conform to POSIX.2, because traditional grep lacked a -q option  and  its
              -s option behaved like GNU greps -q option.  Shell scripts intended to be portable to traditional
              grep should avoid both -q and -s and should redirect output to /dev/null instead.

忽略文件不存在或不可访问的错误信息。

2.32 -U, --binary 不要忽略回车符

       -U, --binary
              Treat the file(s) as binary.  By default, under MS-DOS and MS-Windows, grep guesses the file  type
              by  looking  at  the contents of the first 32KB read from the file.  If grep decides the file is a
              text file, it strips the CR characters from the original file contents (to  make  regular  expres-
              sions  with ^ and $ work correctly).  Specifying -U overrules this guesswork, causing all files to
              be read and passed to the matching mechanism verbatim; if the file is a text file with CR/LF pairs
              at  the  end  of  each line, this will cause some regular expressions to fail.  This option has no
              effect on platforms other than MS-DOS and MS-Windows.

该选项告诉不要忽略回车符,一直没机会用过这个选项,因为从来没在MS-DOS和MS-Windows下使用过grep。这里可能有朋友搞不清回车符与换行符的区别,简单说一下。回车符、换行符都是格式控制符,格式控制符是干什么的呢?是让打印机看的,让打印机打印的文档有一个方便人阅读的格式,现在的编辑器一般不会把格式控制符显示出来。古老的打印机打印下一行需要两个指令:回车(CR),将打印机针头回到该行行首;换行(LF),将打印机针头向下移一行(其实是打印纸向上走一行)。对应到C语言中就是\r\n。MS-Windows下总是以这种格式换行。而linux下只需要一个\n就够了。细心的话你会发现,windows下编辑的文本文件上传到Linux服务器上,使用vi编辑器查看的话会看到"^M"符号,这个就是回车符。还有在windows下打开linux上的文本文件时,UE总是提示”是否将文件转为DOS格式?“意思是:你想将换行符换成回车换行符吗?一般不用换,现在的编辑器都能兼容的。

2.33 -u, --unix-byte-offsets 按Unix的规矩显示偏移量

       -u, --unix-byte-offsets
              Report Unix-style byte offsets.  This switch causes grep to report byte offsets  as  if  the  file
              were  Unix-style text file, i.e. with CR characters stripped off.  This will produce results iden-
              tical to running grep on a Unix machine.  This option has no effect unless -b option is also used;
              it has no effect on platforms other than MS-DOS and MS-Windows.

前面-b选项打印匹配行的偏移量,这个偏移量是把MS-Windows下的CR字符考虑进去的。而-u选项却忽略了CR字符,即使用Unix-style。

2.34 -w, --word-regexp 匹配完整单词

       -w, --word-regexp
              Select only those lines containing matches that form whole words.  The test is that  the  matching
              substring must either be at the beginning of the line, or preceded by a non-word constituent char-
              acter.  Similarly, it must be either at the end of the line or followed by a non-word  constituent
              character.  Word-constituent characters are letters, digits, and the underscore.

只匹配完整单词。比如文件内包含"Those"单词,搜索”Tho“,如果没有-w选项,那么该行也会做为匹配行打印出来;如果有-w选项,那么Those就不会是Tho的匹配。使用UltraEdit工具搜索时有个选项”match whole world only“就是这个意思。

2.35 -x, --line-regexp 匹配完整一行

       -x, --line-regexp
              Select only those matches that exactly match the whole line.

与-w类似,-x选项选择匹配完整的一行,也就是如果某行需要与PATTERN完全匹配才会被选中。

2.36 -y 一个废弃的选项

-y     Obsolete synonym for -i.

这个选项已被废弃,作用跟-i一样,忽略大小写。废弃原因不详,或许是考虑到多个标准的统一,也许是考虑到-i更容易让用户记住,ignore-case嘛,多直接。

2.37 -Z, --null 文件名与匹配行之间使用空字符

       -Z, --null
              Output a zero byte (the ASCII NUL character) instead of the character that normally follows a file
              name.  For example, grep -lZ outputs a zero byte after each file name instead of  the  usual  new-
              line.   This  option  makes  the output unambiguous, even in the presence of file names containing
              unusual characters like newlines.  This option can be used with commands like find  -print0,  perl
              -0, sort -z, and xargs -0 to process arbitrary file names, even those that contain newline charac-
              ters.

又一个冷门的选项,前面说过如果搜索多个文件时,grep默认会在每个输出前打印文件名,文件名与匹配行之间会有一个冒号隔开,-Z选项告诉grep不要使用冒号了,使用一个NUL字符吧。

三、grep常用案例(慢慢更新中。。。)

四、附录

4.1 ANSI/VT100 Terminal Control

Sets multiple display attribute settings. The following lists standard attributes:
0   Reset all attributes
1   Bright
2   Dim
4   Underscore
5   Blink
7   Reverse
8   Hidden

    Foreground Colours
30  Black
31  Red
32  Green
33  Yellow
34  Blue
35  Magenta
36  Cyan
37  White

    Background Colours
40  Black
41  Red
42  Green
43  Yellow
44  Blue
45  Magenta
46  Cyan
47  White

4.2 正则表达式相关

REGULAR EXPRESSIONS
       A regular expression is a pattern that describes a set of strings.  Regular expressions  are  constructed
       analogously to arithmetic expressions, by using various operators to combine smaller expressions.

       Grep  understands  two  different  versions  of  regular  expression  syntax: “basic” andextended.”  In
       GNU grep, there is no difference in available functionality using either syntax.   In  other  implementa-
       tions,  basic regular expressions are less powerful.  The following description applies to extended regu-
       lar expressions; differences for basic regular expressions are summarized afterwards.

       The fundamental building blocks are the regular expressions that match a single character.  Most  charac-
       ters, including all letters and digits, are regular expressions that match themselves.  Any metacharacter
       with special meaning may be quoted by preceding it with a backslash.

       A bracket expression is a list of characters enclosed by [ and ].  It matches  any  single  character  in
       that  list;  if  the  first character of the list is the caret ^ then it matches any character not in the
       list.  For example, the regular expression [0123456789] matches any single digit.

       Within a bracket expression, a range expression consists of two characters separated  by  a  hyphen.   It
       matches any single character that sorts between the two characters, inclusive, using the locale’s collat-
       ing sequence and character set.  For example, in the default C locale, [a-d]  is  equivalent  to  [abcd].
       Many  locales sort characters in dictionary order, and in these locales [a-d] is typically not equivalent
       to [abcd]; it might be equivalent to [aBbCcDd], for example.  To obtain the traditional interpretation of
       bracket  expressions, you can use the C locale by setting the LC_ALL environment variable to the value C.

       Finally, certain named classes of characters are  predefined  within  bracket  expressions,  as  follows.
       Their  names  are  self  explanatory, and they are [:alnum:], [:alpha:], [:cntrl:], [:digit:], [:graph:],
       [:lower:], [:print:], [:punct:], [:space:], [:upper:], and [:xdigit:].  For  example,  [[:alnum:]]  means
       [0-9A-Za-z],  except  the latter form depends upon the C locale and the ASCII character encoding, whereas
       the former is independent of locale and character set.  (Note that the brackets in these class names  are
       part  of  the  symbolic  names,  and  must be included in addition to the brackets delimiting the bracket
       list.)  Most metacharacters lose their special meaning inside lists.  To include a  literal  ]  place  it
       first in the list.  Similarly, to include a literal ^ place it anywhere but first.  Finally, to include a
       literal - place it last.

       The period .  matches any single character.  The symbol \w is a synonym for [[:alnum:]] and \W is a  syn-
       onym for [^[:alnum]].

       The  caret  ^  and  the  dollar sign $ are metacharacters that respectively match the empty string at the
       beginning and end of a line.  The symbols \< and \> respectively match the empty string at the  beginning
       and  end  of  a  word.   The symbol \b matches the empty string at the edge of a word, and \B matches the
       empty string provided it’s not at the edge of a word.

       A regular expression may be followed by one of several repetition operators:
       ?      The preceding item is optional and matched at most once.
       *      The preceding item will be matched zero or more times.
       +      The preceding item will be matched one or more times.
       {n}    The preceding item is matched exactly n times.
       {n,}   The preceding item is matched n or more times.
       {n,m}  The preceding item is matched at least n times, but not more than m times.

       Two regular expressions may be concatenated; the resulting regular expression matches any  string  formed
       by concatenating two substrings that respectively match the concatenated subexpressions.

       Two  regular  expressions may be joined by the infix operator |; the resulting regular expression matches
       any string matching either subexpression.

       Repetition takes precedence over concatenation, which in turn takes precedence over alternation.  A whole
       subexpression may be enclosed in parentheses to override these precedence rules.

       The  backreference  \n,  where  n  is a single digit, matches the substring previously matched by the nth
       parenthesized subexpression of the regular expression.

       In basic regular expressions the metacharacters ?, +, {, |, (, and ) lose their special meaning;  instead
       use the backslashed versions \?, \+, \{, \|, \(, and \).

       Traditional egrep did not support the { metacharacter, and some egrep implementations support \{ instead,
       so portable scripts should avoid { in egrep patterns and should use [{] to match a literal {.

       GNU egrep attempts to support traditional usage by assuming that { is not special  if  it  would  be  the
       start  of  an invalid interval specification.  For example, the shell command egrep ’{1’ searches for the
       two-character string {1 instead of reporting a syntax error in the regular  expression.   POSIX.2  allows
       this behavior as an extension, but portable scripts should avoid it.

4.3 环境变量相关

ENVIRONMENT VARIABLES
       Grep’s behavior is affected by the following environment variables.

       A  locale  LC_foo is specified by examining the three environment variables LC_ALL, LC_foo, LANG, in that
       order.  The first of these variables that is set specifies the locale.  For example,  if  LC_ALL  is  not
       set,  but LC_MESSAGES is set to pt_BR, then Brazilian Portuguese is used for the LC_MESSAGES locale.  The
       C locale is used if none of these environment variables  are  set,  or  if  the  locale  catalog  is  not
       installed, or if grep was not compiled with national language support (NLS).

       GREP_OPTIONS
              This  variable specifies default options to be placed in front of any explicit options.  For exam-
              ple, if GREP_OPTIONS is--binary-files=without-match --directories=skip’, grep behaves as if  the
              two  options  --binary-files=without-match  and  --directories=skip  had been specified before any
              explicit options.  Option specifications are separated by whitespace.   A  backslash  escapes  the
              next character, so it can be used to specify an option containing whitespace or a backslash.

       GREP_COLOR
              Specifies the marker for highlighting.

       LC_ALL, LC_COLLATE, LANG
              These  variables  specify  the  LC_COLLATE locale, which determines the collating sequence used to
              interpret range expressions like [a-z].

       LC_ALL, LC_CTYPE, LANG
              These variables specify the LC_CTYPE locale, which determines the type of characters, e.g.,  which
              characters are whitespace.

       LC_ALL, LC_MESSAGES, LANG
              These  variables  specify the LC_MESSAGES locale, which determines the language that grep uses for
              messages.  The default C locale uses American English message.

       POSIXLY_CORRECT
              If set, grep behaves as POSIX.2 requires; otherwise, grep behaves more like  other  GNU  programs.
              POSIX.2  requires  that  options that follow file names must be treated as file names; by default,
              such options are permuted to the front of the operand list and  are  treated  as  options.   Also,
              POSIX.2  requires  that  unrecognized  options  be  diagnosed as “illegal”, but since they are not
              really against the law the default is to diagnose them as “invalid”.   POSIXLY_CORRECT  also  dis-
              ables _N_GNU_nonoption_argv_flags_, described below.

       _N_GNU_nonoption_argv_flags_
              (Here  N is grep’s numeric process ID.)  If the ith character of this environment variable’s value
              is 1, do not consider the ith operand of grep to be an option, even if it appears to  be  one.   A
              shell can put this variable in the environment for each command it runs, specifying which operands
              are the results of file name wildcard expansion and therefore should not be  treated  as  options.
              This  behavior is available only with the GNU C library, and only when POSIXLY_CORRECT is not set.

DIAGNOSTICS
       Normally, exit status is 0 if selected lines are found and 1 otherwise.  But the exit status is 2  if  an
       error occurred, unless the -q or --quiet or --silent option is used and a selected line is found.

参考资料:

http://www.gnu.org/software/grep/manual/grep.html

http://www.debian-administration.org/articles/460

http://www.regular-expressions.info/posix.html

  • 4
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值