文本处理工具命令xargs, sort, uniq, tr, cut, paste, wc等

最新推荐文章于 2024-05-08 17:24:42 发布

qiushye

最新推荐文章于 2024-05-08 17:24:42 发布

阅读量521

点赞数

分类专栏： linux命令--文本处理文章标签： linux

本文链接：https://blog.csdn.net/qiushye/article/details/105970006

版权

linux命令--文本处理专栏收录该内容

5 篇文章 0 订阅

订阅专栏

1. 计数命令wc

wc -l [file]: 输出文件[file]的行数
wc -c [file]: 输出文件[file]的byte（字节）数
wc -m [file]: 输出文件[file]的字符数, 如果文本都是单字符，则结果等同于wc -c [file]
wc -w [file]: 输出文件[file]的单词数

➜  linux_commands cat test1
hello world!


oh my god!
你是
ttt
fff
gagds

➜  linux_commands wc -c test1
      47 test1
➜  linux_commands wc -l test1
       8 test1
➜  linux_commands wc -m test1
      43 test1
➜  linux_commands wc -w test1
      10 test1

wc -l [file1] [file2]...[file n]: 依次输出[file1]和[file2]等文件的行数，并在最后累加输出

➜  linux_commands wc -l test1 hello.txt
       8 test1
       7 hello.txt
      15 total

2. 合并多文件行命令paste

paste -s (-d [delim]) [file]: 将文件的多行合并成单行，默认用tab分隔符，可以通过-d加特定分隔符

➜  linux_commands cat hello.txt
hi world
hi boys

she is saying hi
hi hello

HELLO everyone
➜  linux_commands paste -s hello.txt
hi world	hi boys		she is saying hi	hi hello		HELLO everyone
➜  linux_commands paste -s -d "#" hello.txt
hi world#hi boys##she is saying hi#hi hello##HELLO everyone
➜  linux_commands paste -s -d "\n" hello.txt (等同于cat hello.txt)
hi world
hi boys

she is saying hi
hi hello

HELLO everyone

paste (-d [delim]) [file1] [file2]: 将两个文件按两边合并，默认用tab分隔符，可以通过-d加特定分隔符

➜  linux_commands cat test1
hello world!


oh my god!
你是
ttt
fff
gagds
➜  linux_commands cat test4
this is test4

oh my god
hey man
➜  linux_commands paste test1 test4
hello world!	this is test4

	oh my god
oh my god!	hey man
你是
ttt
fff
gagds
➜  linux_commands paste -d "#" test1 test4
hello world!#this is test4
#
#oh my god
oh my god!#hey man
你是#
ttt#
fff#
gagds#

ls | paste - - - : 分三列展示当前目录的文件

➜  linux_commands ls | paste - - -
diff.txt	hello.txt	input.txt
ls.cmd	regex.txt	test1
test3	test4	test5
test6	test7	tt
ut

sed = [file] | paste -s -d '\t\n' - - : 给文件[file]的每行做行数标记，此处sed命令为流编辑，具体不详述。

➜  linux_commands sed = test1 | paste -s -d '\t\n' - -
1	hello world!
2
3
4	oh my god!
5	你是
6	ttt
7	fff
8	gagds

3. 行文本切割命令cut

cut -c 3-5: 对于标准输入的每行把第3个到第5个字符切割出来, 3和5只是参数

➜  linux_commands cut -c 3-5
123456    (第一次输入)
345        (切割得345)
qw        (第二次输入)
            (由于长度<3，输出空)
^C
➜  linux_commands

cut -c 3-5 [file]: 对于文件[file]的每行把第3个到第5个字符切割出来, 如果省略5，则是切割到行结尾；同理如果省略3表示从行首开始切割

➜  linux_commands cut -c 3-5 test1
llo


 my

t
f
gds
➜  linux_commands cut -c 3- test1
llo world!


 my god!

t
f
gds

cut -d':' -f5 : 将标准输入的每行中，将按':'分割的第5部分输出，此处':'只是分割符，也可以是空格或分号，默认为tab；如果文本中没有分隔符，则将原文本输出。

➜  linux_commands cut -d':' -f2
aa:bb:cc
bb
aa
aa
aa:

^C
➜  linux_commands

cut -s -d':' -f2 : 将标准输入的每行中，将按':'分割的第2部分输出，-s表示如果该行没有分割符，则不打印

➜  linux_commands cut -s -d':' -f2
aa
aa:bb
bb
^C
➜  linux_commands

cut -d';' -f2,3 : 将标准输入的每行中，将按';'分割的第2到3部分输出，如果将 -f2,3改为 -f2- 表示将分割的第2部分到文本末尾输出。

➜  linux_commands cut -d';' -f2,3
aa;bb;cc;dd
bb;cc
aa
aa
^C
➜  linux_commands

cut (-n) -b 2-4: 对于标准输入的每行把第2个到第4个字节切割出来，如果加上-n参数，表示不分离多字节字符

➜  linux_commands cut -b 2-4
asdf
sdf
^C
➜  linux_commands cut -n -b 2-4
晚上去吃饭
晚
^C
➜  linux_commands

4. 转换字符命令tr

tr [ch1] [ch2] < [file]: 将文件[file]中的每个字符[ch1]转换成[ch2]输出，这里只输出不更改文件

➜  linux_commands cat hello.txt
hi world
hi boys

she is saying hi
hi hello

HELLO everyone
➜  linux_commands tr h o < hello.txt
oi world
oi boys

soe is saying oi
oi oello

HELLO everyone

tr [str1] [str2] < [file]: 将文件[file]中出现的字符串[str1]包含的字符替换成字符串[str2]包含的字符，如果两个字符串的长度不一致，采用“多退少补”的原则进行替换。

➜  linux_commands tr 'hi' 'oh' < hello.txt  <=> tr 'h' 'o' < hello.txt | tr 'i' 'h'
oh world
oh boys

soe hs sayhng oh
oh oello (可以看到是对h和i分别替换)

HELLO everyone
➜  linux_commands tr hi hello < hello.txt
he world (只替换了hi->he)
he boys

she es sayeng he
he hello

HELLO everyone
➜  linux_commands tr boys men < hello.txt
hi werld
hi menn (当boys的长度>men，超出的部分以最后一个字母n补充)

nhe in naning hi
hi helle

HELLO evernene

tr -d [str] < [file]: 将文件[file]中所有字符串[str]包含的字符都删除

➜  linux_commands tr -d hi < hello.txt
 world
 boys

se s sayng
 ello

HELLO everyone

tr "[:lower:]" "[:upper:]" < [file]: 将文件中的小写字母转换成大写

➜  linux_commands tr "[:lower:]" "[:upper:]" < hello.txt
HI WORLD
HI BOYS

SHE IS SAYING HI
HI HELLO

HELLO EVERYONE

tr -s [ch1] [ch2]: 将标准输入中连续出现的[ch1]字符替换成[ch2]字符

➜  linux_commands echo "Hello    my  friend" | tr -s ' ' '\n'  (多个空格替换成换行)
Hello
my
friend

tr -d -c '[characters]' : 只输出标准输入中含字符集[characters]的部分，即删除字符集[characters]的补集

➜  linux_commands echo "22aa" | tr -d '[0-9]'
aa
➜  linux_commands echo "22aa" | tr -d -c '[0-9]'
22%

5. 文本排序命令sort

sort (-r) [file]: 对文件[file]的所有行按升序排列，如果加-r则表示降序排列, 加上-R表示随机排序

➜  linux_commands cat hello.txt
hi world
hi boys
hello boys
HELLO everyone
she is saying hi
hi boys
➜  linux_commands sort hello.txt
HELLO everyone
hello boys
hi boys
hi boys
hi world
she is saying hi
➜  linux_commands sort -r hello.txt
she is saying hi
hi world
hi boys
hi boys
hello boys
HELLO everyone
➜  linux_commands sort -R hello.txt
she is saying hi
hi boys
hi boys
HELLO everyone
hi world
hello boys

sort --ignore-case [file]: 对文件[file]的所有行忽略大小写排列

➜  linux_commands sort --ignore-case hello.txt
hello boys
HELLO everyone
hi boys
hi boys
hi world
she is saying hi

sort -u [file]：对文件[file]的所有行排序，只保留唯一行（去除重复行）

➜  linux_commands sort -u hello.txt
HELLO everyone
hello boys
hi boys
hi world
she is saying hi

sort -t[ch] -k [num] [file]：对文件[file]按字符[ch]分割（-t参数），然后按第[num]部分的字符串排序(-k参数)

➜  linux_commands sort -t' ' -k 2 hello.txt （按空格后的字符串排序）
hello boys
hi boys
hi boys
HELLO everyone
she is saying hi
hi world

ls -lh | sort -h(/-n) -k 5: 对当前目录下的文件和目录按大小排序（-h表示按实际大小，-n表示按数值大小）

➜  linux_commands ls -lh | sort -n -k 5
total 376
-rw-r--r--  1 qiushye  staff    10B Apr  9 12:13 input.txt
-rw-r--r--  1 qiushye  staff    23B Apr 30 12:21 regex.txt
-rw-r--r--  1 qiushye  staff    68B May 14 12:34 hello.txt
drwxr-xr-x  4 qiushye  staff   128B Apr  4 22:17 ut
-rw-r--r--  1 qiushye  staff   161B Apr  4 22:43 diff.txt
-rw-r--r--  1 qiushye  staff   168K May 15 12:11 commodity.txt （文件最大但数值不是最大）
dr-xr-xrwx  6 eric     staff   192B Mar 22 21:24 tt
drwxr-xr-x  8 qiushye  staff   256B May 15 12:12 temp
➜  linux_commands ls -lh | sort -h -k 5
total 376
-rw-r--r--  1 qiushye  staff    10B Apr  9 12:13 input.txt
-rw-r--r--  1 qiushye  staff    23B Apr 30 12:21 regex.txt
-rw-r--r--  1 qiushye  staff    68B May 14 12:34 hello.txt
drwxr-xr-x  4 qiushye  staff   128B Apr  4 22:17 ut
-rw-r--r--  1 qiushye  staff   161B Apr  4 22:43 diff.txt
dr-xr-xrwx  6 eric     staff   192B Mar 22 21:24 tt
drwxr-xr-x  8 qiushye  staff   256B May 15 12:12 temp
-rw-r--r--  1 qiushye  staff   168K May 15 12:11 commodity.txt

sort [input_file] -o [output_file]: 将对文件[input_file]排序后的结果存到文件[output_file]中

➜  linux_commands sort -u hello.txt -o hello_sorted.txt
➜  linux_commands cat hello_sorted.txt
HELLO everyone
hello boys
hi boys
hi world
she is saying hi

sort -c [file]: 检查文件[file]是否已按增序排序好，如果文件是按降序的，则这里需加-r 参数

➜  linux_commands sort -c hello_sorted.txt
➜  linux_commands sort -c hello.txt
sort: hello.txt:2: disorder: hi boys

6. 重复行筛选命令uniq

sort [file] | uniq : 只展示文件[file]中的各行，重复行只展示一次

➜  linux_commands cat hello.txt
hi world
hi boys
hi world
hello boys
HELLO everyone
hi world
hi boys
➜  linux_commands sort hello.txt| uniq
HELLO everyone
hello boys
hi boys
hi world

sort [file] | uniq -u (/-d) ：只展示文件[file]中不重复的行，如果换-d参数表示只展示有重复的行

➜  linux_commands sort hello.txt| uniq -u
HELLO everyone
hello boys
➜  linux_commands sort hello.txt| uniq -d
hi boys
hi world

sort [file] | uniq -c ：对文件[file]中所有行进行重复计数展示

➜  linux_commands sort hello.txt| uniq -c
   1 HELLO everyone
   1 hello boys
   2 hi boys
   3 hi world

sort [file] | uniq -c | sort -nr ：对文件[file]中所有行进行重复计数展示, 并按大小排序

➜  linux_commands sort hello.txt| uniq -c | sort -nr
   3 hi world
   2 hi boys
   1 hello boys
   1 HELLO everyone

sort [file] | uniq -i : 对文件[file]中的所有行不区分大小写展示不重复行

➜  linux_commands cat hello.txt (最后两行新加的)
hi world
hi boys
hi world
hello boys
HELLO everyone
hi world
hi boys
oh boys
Hello everyone
➜  linux_commands sort hello.txt| uniq -i
HELLO everyone
hello boys
hi boys
hi world

sort [file] | uniq -f [num] ：忽视第[num]个part的字符串，再输出不重复行

➜  linux_commands sort hello.txt| uniq -f 1
HELLO everyone
hello boys
hi world
oh boys

7. 将标准输入转化成命令行参数xargs

shell命令的参数来源包括标准输入和命令行参数，有些命令支持标准输入，如cat, grep; 但有些命令不支持，只能指定命令行参数, xargs的作用就是将标准输入转化成命令需要的参数。

➜  temp ls | cat
one
test1
test3
test4
test5
test6
test7
three
two
➜  temp ls | echo

xargs (echo): 从输入中读取字符串，但输入ctrl+d时结束输入并打印，后面加echo是相同效果。

➜  temp xargs
a
vb
bbb
a vb bbb
➜  temp xargs echo
a
vb
bbb
a vb bbb

echo "one two three" | xargs mkdir: 将标准输入的内容作为创建目录的参数，默认将空格和换行作为分隔符

➜  temp echo "one two three" | xargs mkdir
➜  temp ls
one   three two
➜  temp echo "aa\nbb" | xargs mkdir
➜  temp ls
aa    bb    one   three two

xargs -p (/ -t): -p参数会对打印出要执行的命令并询问是否执行，y表示执行；-t参数会打印出要执行的命令并直接执行。

➜  temp ls | xargs -p echo
echo aa bb one three two?...y
aa bb one three two
➜  temp ls | xargs -t echo
echo aa bb one three two
aa bb one three two

find [path] -type f -print0 | xargs -0 rm: 找出目录[path]下的所有文件并删除，由于xargs以空格作为默认分隔符，而find命令有一个特别的参数-print0，指定输出的文件列表以null分隔。然后，xargs命令的-0参数表示用null当作分隔符，这样可以保证删除文件名带空格的文件（此命令测试时注意path内文件是可删的）

➜  ut ls  (ut是目录名)
test2 test3
➜  ut find . -type f -print0 | xargs -0 rm
➜  ut

find [path] -name [pattern] | xargs grep [str]: 在目录[path]中查找文件名符合[pattern]模式的文件，并分别找出带字符串[str]的行

➜  linux_commands find . -name "hello*" | xargs grep hello
./hello_sorted.txt:hello boys
./hello.txt:hello boys

xargs -L(-n) [num]: 指定[num]行作为命令的参数, -n表示指定[num]项作为命令的参数

➜  linux_commands xargs -L 1 find . -name （指定1行作为find . -name的参数）
"*.txt" （第一次输入）
./hh.txt
./regex.txt
./diff.txt
./input.txt
./hello_sorted.txt
./hello.txt
./commodity.txt
"hello*" （第二次输入）
./hello_sorted.txt
./hello.txt
➜  linux_commands echo {0..9} | xargs -n 2 echo
0 1
2 3
4 5
6 7
8 9

xargs -I [str]：将字符串[str]传给多个命令，类似变量名

➜  temp ls
input.txt three     two
➜  temp cat input.txt| xargs -I name sh -c 'echo name; mkdir name' (name作为参数传递)
aa
bb
cc
➜  temp ls
aa        bb        cc        input.txt three     two

qiushye

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
文本处理工具命令xargs, sort, uniq, tr, cut, paste, wc等

1. 计数命令wcwc -l [file]: 输出文件[file]的行数 wc -c[file]: 输出文件[file]的byte（字节）数 wc -m[file]: 输出文件[file]的字符数, 如果文本都是单字符，则结果等同于wc -c [file] wc -w [file]:输出文件[file]的单词数➜ linux_commands cat test1hello ...
复制链接

扫一扫