linux指定位置去重,Linux命令uniq去重用法 - 米扑博客

最新推荐文章于 2024-01-18 07:00:00 发布

几深老李

最新推荐文章于 2024-01-18 07:00:00 发布

阅读量719

点赞数

文章标签： linux指定位置去重

uniq 命令用于检查及删除文本文件中重复出现的行列，一般与 sort 命令结合使用。

uniq 可检查文本文件中重复出现的行列。

命令语法：

uniq [-c/d/D/u/i] [-f Fields] [-s N] [-w N] [InFile] [OutFile]

参数解释：

-c: 在每列旁边显示该行重复出现的次数。

-d: 仅显示重复出现的行列，显示一行。

-D: 显示所有重复出现的行列，有几行显示几行。

-u: 仅显示出一次的行列

-i: 忽略大小写字符的不同

-f Fields: 忽略比较指定的列数。

-s N: 忽略比较前面的N个字符。

-w N: 对每行第N个字符以后的内容不作比较。

[InFile]: 指定已排序好的文本文件。如果不指定此项，则从标准读取数据；

[OutFile]: 指定输出的文件。如果不指定此选项，则将内容显示到标准输出设备(显示终端)

命令语法：

uniq [-cdu][-f< 栏位>][-s< 字符位置>][-w< 字符位置>][--help][--version][输入文件][输出文件]

参数：

-c或--count 在每列旁边显示该行重复出现的次数。

-d或--repeated 仅显示重复出现的行列。

-f或--skip-fields= 忽略比较指定的栏位。

-s或--skip-chars= 忽略比较指定的字符。

-u或--unique 仅显示出一次的行列。

-w或--check-chars= 指定要比较的字符。

--help 显示帮助。

--version 显示版本信息。

[输入文件] 指定已排序好的文本文件。如果不指定此项，则从标准读取数据；

[输出文件] 指定输出的文件。如果不指定此选项，则将内容显示到标准输出设备(显示终端)。

uniq 用法示例

新建测试文件

vim uniq.txt

# cat uniq.txt

My name is Delav

I'm learning Java

who am i

Who am i

Python is so simple

My name is Delav

That's good

And studying Golang

1. 直接去重

uniq uniq.txt

结果为：

# uniq uniq.txt

My name is Delav

I'm learning Java

who am i

Who am i

Python is so simple

My name is Delav

That's good

And studying Golang

2. 显示重复出现的次数

uniq -c uniq.txt

结果为：

# uniq -c uniq.txt

3 My name is Delav

3 I'm learning Java

1 who am i

1 Who am i

1 Python is so simple

1 My name is Delav

2 That's good

1 And studying Golang

你会发现，上面有两行 "My name is Delav"是相同的。

也就是说，当重复的行不相邻时，uniq 命令是不起作用的。

所以，经常需要 sort + uniq 命令一起使用，详见米扑博客：Linux 删除重复行

sort uniq.txt | uniq -c

结果为：

# sort uniq.txt | uniq -c

1 And studying Golang

3 I'm learning Java

4 My name is Delav

1 Python is so simple

2 That's good

1 who am i

1 Who am i

取出两个文件的并集，重复的行只保留一份

sort file1 file2 | uniq

3. 只显示重复的行，并显示重复次数(取交集)

uniq -cd uniq.txt

结果为：

# uniq -cd uniq.txt

3 My name is Delav

3 I'm learning Java

2 That's good

显示所有重复的行 -D，不能与 -c 一起使用

uniq -D uniq.txt

结果为：

# uniq -d uniq.txt

My name is Delav

I'm learning Java

That's good

# uniq -D uniq.txt

My name is Delav

I'm learning Java

That's good

取出两个文件的交集，只留下同时存在于两个文件中的文件

sort file1 file2 | uniq -d

4. 只显示不重复的行，重复的都不显示(删除交集)

sort uniq.txt | uniq -cu

结果为：

# sort uniq.txt | uniq -cu

1 And studying Golang

1 Python is so simple

1 who am i

1 Who am i

删除交集(即除去重复行)，只留下其他的只有一行

sort file1 file2 | uniq -u

5. 忽略第几列字符

下面这里 -f 1 忽略了第一列字符，所以"who am i" 和 "Who am i" 判定为重复

uniq -c -f 1 uniq.txt

结果为：

# uniq -c -f 1 uniq.txt

3 My name is Delav

3 I'm learning Java

2 who am i

1 Python is so simple

1 My name is Delav

2 That's good

1 And studying Golang

6. 忽略大小写

下面这里 -i 忽略了大小写，所以"who am i" 和 "Who am i" 判定为重复

uniq -c -i uniq.txt

结果为：

# uniq -c -i uniq.txt

3 My name is Delav

3 I'm learning Java

2 who am i

1 Python is so simple

1 My name is Delav

2 That's good

1 And studying Golang

7. 忽略前面N个字符

下面这里 -s 4 表示忽略前面四个字符，所以"who am i" 和 "Who am i" 判定为重复

uniq -c -s 4 uniq.txt

结果为：

# uniq -c -s 4 uniq.txt

3 My name is Delav

3 I'm learning Java

2 who am i

1 Python is so simple

1 My name is Delav

2 That's good

1 And studying Golang

8. 忽略第N个字符后的内容

下面这里 -w 2 表示忽略第二个字符后的内容，因第一个字母"who am i" 和 "Who am i" 不同，因此判定不重复

uniq -c -w 2 uniq.txt

结果为：

# uniq -c -w 2 uniq.txt

3 My name is Delav

3 I'm learning Java

1 who am i

1 Who am i

1 Python is so simple

1 My name is Delav

2 That's good

1 And studying Golang

总结

uniq 当重复的行不相邻时，uniq 命令是不起作用的。

所以，经常需要 sort + uniq 命令一起使用，详见米扑博客：Linux 删除重复行

sort file1 file2 | uniq 取出两个文件的并集(重复的行只保留一份)

sort file1 file2 | uniq -u 删除交集(即除去重复行)，只留下其他的只有一行

sort file1 file2 | uniq -d 取出两个文件的交集(只留下同时存在于两个文件中的文件)

示例：

删除重复行：

uniq file.txt

sort file.txt | uniq

sort -u file.txt

只显示单一行：

uniq -u file.txt

sort file.txt | uniq -u

统计各行在文件中出现的次数：

sort file.txt | uniq -c

在文件中找出重复的行：

sort file.txt | uniq -d

参考推荐：

几深老李

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫