Linux: 文本查找包含某个关键字的段落（awk实现）

最新推荐文章于 2024-09-19 14:31:06 发布

匿_名_用_户

最新推荐文章于 2024-09-19 14:31:06 发布

阅读量1.4w

点赞数 2

分类专栏： DB2 LINUX shell 文章标签： linux awk

本文链接：https://blog.csdn.net/qingsong3333/article/details/78067778

版权

DB2 同时被 3 个专栏收录

181 篇文章 9 订阅

订阅专栏

LINUX

64 篇文章 0 订阅

订阅专栏

shell

35 篇文章 0 订阅

订阅专栏

Linux: 文本查找包含某个关键字的段落（awk实现）

AIX中的grep命令的"-p"选项可以查找包含某个关键字的段落（我们这里把段落定义为由空行分开的记录，段与段之间有至少一个空行），比如下面的文本中有两段：

$ cat test.txt
Hello,world

This is a file with
two paragraph.

下面的命令可以查找db2diag.log中每个数据库取消激活的段落：

$ grep -ip 'DEACTIVATED' db2diag.log

$ grep -ip 'DEACTIVATED' db2diag.log
2017-09-17-12.03.33.048373+480 E1594733A513         LEVEL: Event
PID     : 19726438             TID : 3343           PROC : db2sysc 0
INSTANCE: e105q5a              NODE : 000           DB   : SAMPLE
APPHDL  : 0-81                 APPID: *LOCAL.e105q5a.170917035458
AUTHID  : E105Q5A              HOSTNAME: db2b
EDUID   : 3343                 EDUNAME: db2agent (idle) 0
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::FreeResourcesOnDBShutdown, probe:15127
STOP    : DATABASE: SAMPLE   : DEACTIVATED: NO

2017-09-17-12.03.58.149245+480 E1601224A513         LEVEL: Event
PID     : 19726438             TID : 3343           PROC : db2sysc 0
INSTANCE: e105q5a              NODE : 000           DB   : SAMPLE
APPHDL  : 0-109                APPID: *LOCAL.e105q5a.170917040333
AUTHID  : E105Q5A              HOSTNAME: db2b
EDUID   : 3343                 EDUNAME: db2agent (idle) 0
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::FreeResourcesOnDBShutdown, probe:15127
STOP    : DATABASE: SAMPLE   : DEACTIVATED: NO

2017-09-17-12.16.49.507211+480 E1609705A513         LEVEL: Event
PID     : 19726438             TID : 3343           PROC : db2sysc 0
INSTANCE: e105q5a              NODE : 000           DB   : SAMPLE
APPHDL  : 0-125                APPID: *LOCAL.e105q5a.170917040401
AUTHID  : E105Q5A              HOSTNAME: db2b
EDUID   : 3343                 EDUNAME: db2agent (idle) 0
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::FreeResourcesOnDBShutdown, probe:15127
STOP    : DATABASE: SAMPLE   : DEACTIVATED: NO

linux中的grep命令的-p选项有完全不同的含义，而且没有其他选项来实现这个功能。换个思路，能不能把“一段”作为“一行”来处理呢？答案是可以的，这时候就显示awk的强大了，awk有两个关键字，如下：

ORS terminates each record on output, initially = "\n".
RS input record separator, initially = "\n".

RS表示行分割符，默认是换行符'\n'。如果把“一段”当作“一行”，那么“行”与“行”之间的分割符就是两个或以上的换行符，所以，只需要指定RS为"\n\n+"就可以了，awk手册中提供了一个绝佳的范本：

   12. Multi-line records
       Since mawk interprets RS as a regular expression, multi-line records are easy.  Setting RS = "\n\n+",
       makes  one or more blank lines separate records.  If FS = " " (the default), then single newlines, by
       the rules for <SPACE> above, become space and single newlines are field separators.

              For example, if a file is "a b\nc\n\n", RS = "\n\n+" and FS = " ", then there  is  one  record
              "a b\nc"  with  three fields "a", "b" and "c".  Changing FS = "\n", gives two fields "a b" and
              "c"; changing FS = "", gives one field identical to the record.

       If you want lines with spaces or tabs to be considered blank, set RS = "\n([ \t]*\n)+".  For compati-
       bility  with  other awks, setting RS = "" has the same effect as if blank lines are stripped from the
       front and back of files and then records are determined as if RS = "\n\n+".  Posix requires that "\n"
       always separates records when RS = "" regardless of the value of FS.  mawk does not support this con-
       vention, because defining "\n" as <SPACE> makes it unnecessary.

       Most of the time when you change RS for multi-line records, you will  also  want  to  change  ORS  to
       "\n\n" so the record spacing is preserved on output.

所以，在linux下面，命令如下：
$ awk 'BEGIN {RS = "\n\n+";ORS = "\n\n"} /DEACTIVATED/ {print $0}' db2diag.log

如果要反选，即不包含关键字的段落，在关键字前加上!
$ awk 'BEGIN {RS = "\n\n+";ORS = "\n\n"} !/DEACTIVATED/ {print $0}' db2diag.log

另外，也可以直接将RS设置为空串，效果是一样的

$ awk 'BEGIN {RS = "";ORS = "\n\n"} /DEACTIVATED/ {print $0}' db2diag.log

也可以按照其他方式分段，只需要指定正确的RS值即可。