Linux三剑客，3 个重要的命令：awk , sed , grep ＜PK＞ ripgrep 正则表达式搜索工具

ken2232

已于 2024-12-22 00:28:01 修改

阅读量824

点赞数 15

分类专栏： linux 文章标签： linux 运维服务器

于 2024-06-29 20:45:25 首次发布

本文链接：https://blog.csdn.net/ken2232/article/details/140070142

版权

linux 专栏收录该内容

504 篇文章

订阅专栏

危/险的命令：如

注：可以使用命令行回收站 Trash-Cli，取代 rm 命令。

为什么Linux的命令 rm 没有回收站呢？Trash-Cli：Linux 命令行回收站工具（***） https://blog.csdn.net/ken2232/article/details/136981360

sudo apt install ripgrep
rg --version

几个特点如下：

自动递归搜索（grep 需要-R）
自动忽略.gitignore 中的文件以及 2 进制文件
可以搜索指定文件类型（rg -tpy foo限定 python 文件， rg -Tjs foo排除 js 文件)
支持大部分 grep 的 feature(常用的都有)
支持各种文件编译（UTF-8， UTF-16， latin-1, GBK, EUC-JP, Shift_JIS 等等）
支持搜索常见压缩文件(gzip, xz, lzma, bzip2, lz4)
自动高亮匹配的结果
更少的命令名称 rg (grep 是四个字符)
不支持多行搜索和花哨的正则

最快的文本搜索神器 ripgrep — outmanzzq

项目地址：https://github.com/BurntSushi/ripgrep

ripgrep 特点

非常快速的搜索速度。
极为丰富和实用的搜索功能。
支持查找替换。
支持搜索多种中文编码的文件（使用--encoding指定编码）。
支持通过配置文件.ripgreprc改变默认行为。
支持将搜索结果输出为json格式。
支持搜索多种格式的压缩文件，例如gz、bz2等。
对输出结果进行排序。
默认会读取.gitignore文件并忽略其中设置的文件（可以使用--no-ignore打开）。
默认不会读取隐藏文件（可以使用--hidden打开）。
默认不会搜索非文本文件（可以使用--text打开）。

ripgrep 使用场景

ripgrep是一个非常好用的工具，它可以在多种场景下使用，例如：

在代码搜索方面：ripgrep可以快速搜索代码文件，查找特定的代码模式或函数。
日志文件分析：ripgrep可以用于分析大型日志文件，快速查找特定的文本模式或关键字。
配置文件检查：ripgrep可以检查配置文件，查找特定的配置项或错误。

任何需要快速搜索特定文本内容的场景：ripgrep的高效搜索引擎使其在海量文本数据中定位所需信息变得轻而易举。

Linux 新变革已经开始，文本三剑客地位不保！：https://blog.csdn.net/mingongge/article/details/136671557

Linux三剑客是指什么？

awk，适合格式化文本文件，对文本文件进行更复杂的加工处理、分析；
sed，擅长文本编辑，处理匹配到的文本内容；
grep，擅长单纯的查找或匹配文本内容；

注：

awk ：全称:Aho,Weiberger,Kernighan,是三个作者的名字。
sed ：全称是:Stream EDitor(流编辑器)
grep ：grep （缩写来自Globally search a Regular Expression and Print / global regular expression print）是一种强大的文本搜索工具，它能使用特定模式匹配（包括正则表达式）搜索文本，并默认输出匹配行。

------------------------------------------------------------

awk

$ awk -h
Usage: awk [POSIX or GNU style options] -f progfile [--] file ...
Usage: awk [POSIX or GNU style options] [--] 'program' file ...
POSIX options:       GNU long options: (standard)
   -f progfile       --file=progfile
   -F fs           --field-separator=fs
   -v var=val       --assign=var=val
Short options:       GNU long options: (extensions)
   -b           --characters-as-bytes
   -c           --traditional
   -C           --copyright
   -d[file]       --dump-variables[=file]
   -D[file]       --debug[=file]
   -e 'program-text'   --source='program-text'
   -E file           --exec=file
   -g           --gen-pot
   -h           --help
   -i includefile       --include=includefile
   -l library       --load=library
   -L[fatal|invalid|no-ext]   --lint[=fatal|invalid|no-ext]
   -M           --bignum
   -N           --use-lc-numeric
   -n           --non-decimal-data
   -o[file]       --pretty-print[=file]
   -O           --optimize
   -p[file]       --profile[=file]
   -P           --posix
   -r           --re-interval
   -s           --no-optimize
   -S           --sandbox
   -t           --lint-old
   -V           --version

To report bugs, see node `Bugs' in `gawk.info'
which is section `Reporting Problems and Bugs' in the
printed version. This same information may be found at
https://www.gnu.org/software/gawk/manual/html_node/Bugs.html.
PLEASE do NOT try to report bugs by posting in comp.lang.awk,
or by using a web forum such as Stack Overflow.

gawk is a pattern scanning and processing language.
By default it reads standard input and writes standard output.

Examples:
   awk '{ sum += $1 }; END { print sum }' file
   awk -F: '{ print $1 }' /etc/passwd

sed

$ sed -h
sed: invalid option -- 'h'
Usage: sed [OPTION]... {script-only-if-no-other-script} [input-file]...

-n, --quiet, --silent
                 suppress automatic printing of pattern space
      --debug
                 annotate program execution
-e script, --expression=script
                 add the script to the commands to be executed
-f script-file, --file=script-file
                 add the contents of script-file to the commands to be executed
--follow-symlinks
                 follow symlinks when processing in place
-i[SUFFIX], --in-place[=SUFFIX]
                 edit files in place (makes backup if SUFFIX supplied)
-l N, --line-length=N
                 specify the desired line-wrap length for the `l' command
--posix
                 disable all GNU extensions.
-E, -r, --regexp-extended
                 use extended regular expressions in the script
                 (for portability use POSIX -E).
-s, --separate
                 consider files as separate rather than as a single,
                 continuous long stream.
      --sandbox
                 operate in sandbox mode (disable e/r/w commands).
-u, --unbuffered
                 load minimal amounts of data from the input files and flush
                 the output buffers more often
-z, --null-data
                 separate lines by NUL characters
      --help     display this help and exit
      --version output version information and exit

If no -e, --expression, -f, or --file option is given, then the first
non-option argument is taken as the sed script to interpret. All
remaining arguments are names of input files; if no input files are
specified, then the standard input is read.

GNU sed home page: <https://www.gnu.org/software/sed/>.
General help using GNU software: <https://www.gnu.org/gethelp/>.

grep

$ grep --help
Usage: grep [OPTION]... PATTERNS [FILE]...
Search for PATTERNS in each FILE.
Example: grep -i 'hello world' menu.h main.c
PATTERNS can contain multiple patterns separated by newlines.

Pattern selection and interpretation:
-E, --extended-regexp     PATTERNS are extended regular expressions
-F, --fixed-strings       PATTERNS are strings
-G, --basic-regexp        PATTERNS are basic regular expressions
-P, --perl-regexp         PATTERNS are Perl regular expressions
-e, --regexp=PATTERNS     use PATTERNS for matching
-f, --file=FILE           take PATTERNS from FILE
-i, --ignore-case         ignore case distinctions in patterns and data
      --no-ignore-case      do not ignore case distinctions (default)
-w, --word-regexp         match only whole words
-x, --line-regexp         match only whole lines
-z, --null-data           a data line ends in 0 byte, not newline

Miscellaneous:
-s, --no-messages         suppress error messages
-v, --invert-match        select non-matching lines
-V, --version             display version information and exit
      --help                display this help text and exit

Output control:
-m, --max-count=NUM       stop after NUM selected lines
-b, --byte-offset         print the byte offset with output lines
-n, --line-number         print line number with output lines
      --line-buffered       flush output on every line
-H, --with-filename       print file name with output lines
-h, --no-filename         suppress the file name prefix on output
      --label=LABEL         use LABEL as the standard input file name prefix
-o, --only-matching       show only nonempty parts of lines that match
-q, --quiet, --silent     suppress all normal output
      --binary-files=TYPE   assume that binary files are TYPE;
                            TYPE is 'binary', 'text', or 'without-match'
-a, --text                equivalent to --binary-files=text
-I                        equivalent to --binary-files=without-match
-d, --directories=ACTION how to handle directories;
                            ACTION is 'read', 'recurse', or 'skip'
-D, --devices=ACTION      how to handle devices, FIFOs and sockets;
                            ACTION is 'read' or 'skip'
-r, --recursive           like --directories=recurse
-R, --dereference-recursive likewise, but follow all symlinks
      --include=GLOB        search only files that match GLOB (a file pattern)
      --exclude=GLOB        skip files that match GLOB
      --exclude-from=FILE   skip files that match any file pattern from FILE
      --exclude-dir=GLOB    skip directories that match GLOB
-L, --files-without-match print only names of FILEs with no selected lines
-l, --files-with-matches print only names of FILEs with selected lines
-c, --count               print only a count of selected lines per FILE
-T, --initial-tab         make tabs line up (if needed)
-Z, --null                print 0 byte after FILE name

Context control:
-B, --before-context=NUM print NUM lines of leading context
-A, --after-context=NUM   print NUM lines of trailing context
-C, --context=NUM         print NUM lines of output context
-NUM                      same as --context=NUM
      --color[=WHEN],
      --colour[=WHEN]       use markers to highlight the matching strings;
                            WHEN is 'always', 'never', or 'auto'
-U, --binary              do not strip CR characters at EOL (MSDOS/Windows)

When FILE is '-', read standard input. With no FILE, read '.' if
recursive, '-' otherwise. With fewer than two FILEs, assume -h.
Exit status is 0 if any line is selected, 1 otherwise;
if any error occurs and -q is not given, the exit status is 2.

Report bugs to: bug-grep@gnu.org
GNU grep home page: <http://www.gnu.org/software/grep/>
General help using GNU software: <https://www.gnu.org/gethelp/>