linux数据处理工具

最新推荐文章于 2024-06-17 17:12:45 发布

M.za

最新推荐文章于 2024-06-17 17:12:45 发布

阅读量190

点赞数 7

文章标签： linux 服务器网络

本文链接：https://blog.csdn.net/qq_46319647/article/details/138875691

版权

一、sort 对文件内容进行排序

1.1、命令语法及选项

语法：
sort [命令选项] 文件
	-c 检查文件是否已经排序
	-t 指定符号作为分隔符
	-k 指定区间
	-n 按照数值大小排序
	-r 反向排序
	-o 输出到指定文件

1.2、实例-sort默认按照字母排序

[root@10-255-20-182 ~]# sort passwd 
adm:x:3:4:adm:/var/adm:/sbin/nologin
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
halt:x:7:0:halt:/sbin:/sbin/halt
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
root:x:0:0:root:/root:/bin/bash
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
sync:x:5:0:sync:/sbin:/bin/sync

[root@10-255-20-182 ~]# cat passwd 
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt

1.3、实例-检查文件有没有排序

[root@10-255-20-182 ~]# sort -c passwd 
sort: passwd:2: disorder: bin:x:1:1:bin:/bin:/sbin/nologin

1.4、实例-指定分隔符然后按照区间进行排序

[root@10-255-20-182 ~]# sort -t ':' -k 3 passwd 
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt

[root@10-255-20-182 ~]# sort -t ':' -k 4 passwd 
halt:x:7:0:halt:/sbin:/sbin/halt
root:x:0:0:root:/root:/bin/bash
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
sync:x:5:0:sync:/sbin:/bin/sync
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin

1.5、实例-指定分隔符按照区间进行倒叙

[root@10-255-20-182 ~]# sort -t ':' -k 3 -r -n passwd 
halt:x:7:0:halt:/sbin:/sbin/halt
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
sync:x:5:0:sync:/sbin:/bin/sync
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
bin:x:1:1:bin:/bin:/sbin/nologin
root:x:0:0:root:/root:/bin/bash

1.6、实例-排序后保存到指定文件

[root@10-255-20-182 ~]# sort -t ':' -k 4 passwd -o /root/passwd 
[root@10-255-20-182 ~]# cat passwd 
halt:x:7:0:halt:/sbin:/sbin/halt
root:x:0:0:root:/root:/bin/bash
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
sync:x:5:0:sync:/sbin:/bin/sync
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin

二、uniq对文件内容进行去重

2.1、语法命令及选项

语法：
sort [命令选项] 文件 | uniq [命令选项]     !!!配合sort一起使用!!!
	-c 显示重复的行的次数  !!! 不相邻的行无法进行统计，需要先用sort排序再统计
	-d 仅显示重复出现的行
	-u 仅显示出现一次的行

2.2、实例内容

[root@10-255-20-182 ~]# cat passwd 
qqqq
wwww
qqqq
wwww
iiii
aaaa
ssss
ssss
aaaa
vvvv
vvvv

2.3、实例-对文件内容重复统计

[root@10-255-20-182 ~]# sort passwd | uniq -c
      2 aaaa
      1 iiii
      2 qqqq
      2 ssss
      2 vvvv
      2 wwww

2.4、实例-只显示文件重复的行

[root@10-255-20-182 ~]# sort passwd | uniq -d
aaaa
qqqq
ssss
vvvv
wwww

2.5、实例-只显示文件不重复的行

[root@10-255-20-182 ~]# sort passwd | uniq -u
iiii

三、wc 统计

wc [命令选项] 文件
	-l 统计行数(空行也会统计)
    -w 统计单词数(分隔符是空格，不支持别的分隔符)
    -m 统计字符数
 不带参数就会统计出文件的行数，单词数，字符数，
 如果想统计多个文件，就使用空格隔开多个文件就行了
[root@didiyun ~]# cat hello.sh 
#!/bin/bash
echo "hello,linux"
[root@didiyun ~]# wc -l hello.sh
2 hello.sh
[root@didiyun ~]# wc -w hello.sh
3 hello.sh
[root@didiyun ~]# wc -m hello.sh
31 hello.sh

文本三剑客与正则表达式

一、文本三剑客

1.1、老三 grep

过滤查找，匹配行，查找文件中符合条件的字符串
grep [命令选项] 文件
	-v 取反
	-i 不区分大小写
	-n 显示行号
	-w 只匹配过滤的单词（以单词为最小单位进行匹配）(单词之间可以是任意分割符)精准查找
	-o 只显示查找内容
	-E 扩展grep（使用了-E参数，就可以使用扩展正则表达式作为查找条件）
	
1.准备环境
[root@test ~]# cat passwd 
hello studying linux
welcome to weibo
i am teacher

2.取出想要的行
[root@test ~]# grep 'teacher' passwd 
i am teacher
[root@test ~]# grep 'weibo' passwd 
welcome to weibo
[root@test ~]# grep 'linux' passwd 
hello studying linux

3.取反
[root@test ~]# grep -v 'linux' passwd
welcome to weibo
i am teacher

4.不区分大小写
[root@test ~]# grep -i 'linux' passwd
hello studying linux
[root@test ~]# grep -i 'LINUX' passwd
hello studying linux
[root@test ~]# grep -i 'LInuX' passwd
hello studying linux

5.显示查找行的行号
[root@test ~]# grep -n 'linux' passwd 
1:hello studying linux
[root@test ~]# grep -n 'weibo' passwd 
2:welcome to weibo
[root@test ~]# grep -n 'am' passwd 
3:i am teacher‘

6.精准匹配
[root@test ~]# grep -w 'linux' passwd 
hello studying linux
[root@test ~]# grep -w 'linx' passwd 
[root@test ~]# grep -w 'linuxa' passwd 
[root@test ~]# grep -w 'lijnux' passwd 

7.只显示查找内容
[root@test ~]# grep -o 'linux' passwd 
linux
linux
linux
linux

8.扩展grep 匹配两个字符串
[root@test ~]# grep 'linux|weibo' passwd 
[root@test ~]# grep -E 'linux|weibo' passwd 
hello studying linux
welcome to weibo
linux linux linux
[root@test ~]# egrep 'linux|weibo' passwd 
hello studying linux
welcome to weibo
linux linux linux

1.1.1、实战–查找

已知linux.txt中有test/weibo/123 要求输出linux.txt的内容不包含test

[root@test ~]# cat linux.txt 
test
weibo
123
[root@test ~]# grep -v 'test' linux.txt 
weibo
123
[root@test ~]# grep -E 'weibo|123' linux.txt 
weibo
123

二、正则表达式

提前设定好的，可以代替某些内容，是为了处理大量字符串和文本定义的一套规则，可以提高效率，快速获取想要的内容，命令行不适用，一般都是配合文本三剑客命令使用的，普通命令不支持也不适合正则表达式。
特点：
1.处理大量文本跟字符的一套规则跟方法
2.工作的时候以行为单位，进行处理
3.正则可以化繁为简，提高Linux效率
正则表达式的底层使用的是kmp算法，有兴趣的可以看看kmp算法的实现原理

2.1、正则表达式分类

BRE 基础正则表达式
ERE 扩展正则表达式

2.2、基础正则表达式

2.2.1、^

^ 代表以什么开头的行

[root@test ~]# grep '^l' linux.txt 
l teach linux
l like badminton ball , billiard ball and chinese chess!
[root@test ~]# grep '^q' linux.txt 
qq813898657

[root@test ~]# ll | grep '^d'
drwxr-xr-x. 2 root root      6 Dec 17 18:56 下载
drwxr-xr-x. 2 root root      6 Dec 17 18:56 公共
drwxr-xr-x. 2 root root      6 Dec 17 18:56 图片

2.2.1.1、^.*

^.* 任意字符开头的内容,包括空行

[root@didiyun ~]# grep -o '^.*n' linux.txt  任意字符开头n结尾
l teach lin
l like badminton ball , billiard ball and chin

2.2.2、$

$ 代表以什么结尾的行
[root@test ~]# grep '\!$' linux.txt 
l like badminton ball , billiard ball and chinese chess!
[root@test ~]# grep 'x$' linux.txt 
l teach linux

2.2.2.

.*$  任意字符结尾，包括空行

[root@didiyun ~]# grep -o 'n.*$' linux.txt  n为起始任意字符结尾
nux
nton ball , billiard ball and chinese chess!

..*不显示空行

2.2.3、^$

^$ 表示空行

[root@test ~]# cat -n linux.txt 
     1	l teach linux
     2	
     3	l like badminton ball , billiard ball and chinese chess!
     4	qq813898657
     
[root@test ~]# grep -n '^$' linux.txt 
2:

2.2.4、.

. 匹配任意一个字符，有且只有一个，但是不能匹配空行

[root@test ~]# grep '.' linux.txt   所有字符都会被匹配
l teach linux
l like badminton ball , billiard ball and chinese chess!
qq813898657

\ 转义字符，让特殊字符也能够进行匹配，比如\.表示匹配符号.
[root@test ~]# grep '\.$' linux.txt  以点结尾
qq813898657.

[root@test ~]# grep '.$' linux.txt  任意字符结尾
l teach linux
l like badminton ball , billiard ball and chinese chess!
qq813898657.

2.2.5、*

* 代表重复前面字符0次或多次

[root@test ~]# grep 'l*' linux.txt  表示0或多次
l teach linux
l like badminton ball , billiard ball and chinese chess!
lllllll
llll
llllll

[root@test ~]# grep 'll*' linux.txt  表示最少一个l
l teach linux
l like badminton ball , billiard ball and chinese chess!
lllllll
llll
llllll

[root@test ~]# grep 'lll*' linux.txt  表示最少两个l
l teach linux
l like badminton ball , billiard ball and chinese chess!
biliard
billlliard 
lllllll
llll
llllll

[root@test ~]# grep 'billl*iard' linux.txt  表示最少两个l, bi iard相当于限位符
l teach linux
l like badminton ball , billiard ball and chinese chess!
lllllll
llll
llllll

2.2.5.1、 .*

.* 代表所有,包括空行
[root@test ~]# grep '.*' linux.txt 
l teach linux

l like badminton ball , billiard ball and chinese chess!
qq813898657.
lllllll
llll
llllll

2.2.7、[]、[^]

[abc] 匹配集合里面的任意字符，因为是连续的字母，所以也可以写成[a-c],
如果是不连续的字母，那么就写想匹配的字母就行了，比如[acrn]
连续的数字就是[0-9],或者[0123456789]
如果是不连续的数字，那么就写想匹配的数字就行了，比如[249]
如果想匹配连续的数字和字母，那么就写成这样[0-9a-z]
如果想匹配不连续的数字和字母，那么就写想匹配的数字和字母就行了
比如[135acbyp]

[root@wb ~]# grep '[a-c]' linux.txt 
l teach linux
l like badminton ball , billiard ball and chinese chess!
[root@wb ~]# grep '[a,h,j]' linux.txt 
l teach linux
l like badminton ball , billiard ball and chinese chess!

[^abc] 取反
[root@wb ~]# grep '[^a-c]' linux.txt 
l teach linux
l like badminton ball , billiard ball and chinese chess!
llllll
l
llll
llllll

2.3、扩展正则表达式

2.3.1、+

+ 匹配前一个字符一次或多次

[root@didiyun ~]# grep -E 'q+' linux.txt 
qq813898657.
[root@didiyun ~]# grep -E 'k+' linux.txt 
l like badminton ball , billiard ball and chinese chess!
[root@didiyun ~]# grep -E 'n+' linux.txt 
l teach linux
l like badminton ball , billiard ball and chinese chess!
[root@didiyun ~]# grep -E 'z+' linux.txt

2.3.2、?

? 匹配前一个字符0次或1次，很少用，了解即可
[root@didiyun ~]# grep -E 'go?d' linux.txt 
gd
god

2.3.3、|

| 或者 同时匹配多个字符串

[root@didiyun ~]# grep -E 'qq|linux' linux.txt 
l teach linux
qq813898657.

2.3.4、a{n,m}

a{n,m} 匹配前个字符最少n次，最多m次
a{n,} 匹配前个字符最少n次
a{n} 匹配前个字符正好n次
a{,m} 匹配前个字符最多m次

[root@didiyun ~]# grep -E 'l{1,2}' linux.txt 
l
l
ll
l
ll
ll
ll
ll

2.3.5、()、\n

() 把括号内的内容当作一个整体进行引用
\n 配合()使用，引用前面()的内容,n表示引用第几个括号里面的内容
[root@didiyun ~]# grep -E 'g(oo|ww)d' linux.txt 
good
gwwd
[root@didiyun ~]# grep -E '(good)(gwwd)\2' linux.txt 
goodgwwdgwwd
[root@didiyun ~]# grep -E '(good)(gwwd)\1' linux.txt 
goodgwwdgood

1.2、老二 sed

流编辑器，对文件对内容进行增删改查，匹配行
sed [选项] [sed内置命令字符] [文件]
	-n取消命令默认输出,一般都是和sed内置命令字符的p参数一起使用	
	-i修改，如果没有-i参数，那么sed修改的只是文件在内存中的数据,并不会影响硬盘上的数据，如果想让修改在硬盘上生效，就必须要使用-i参数	    
	-e支持多次编辑，和使用管道符不一样，要注意
    -r使用扩展正则表达式
内置命令字符			         
	s 替换			         
	g 全局			         
	p 打印
	d 删除
	a 增加,在指定行后面添加一行或者多行文本
	i 插入，在指定行前面添加一行或者多行文本
	s加g表示全局替换，中间的分隔符可以使用#@/这些符号替代，比如使用#替代
	那么前俩个#号之间的内容就表示要替换的旧内容
	后俩个#号之间的内容就表示替换后的新内容
	在使用s的时候如果没有加g，那么就表示只替换每行中的第一个匹配到的内容
a.查找
1.打印文件第2-3行
[root@wb ~]# sed -n '2,3p' linux.txt 
l like badminton ball , billiard ball and chinese chess!
llllll

2.打印文件的第1行和第3行
[root@test ~]# sed -n "1p;3p" linux.txt 
test
123


3.查找含有Linux的行
[root@wb ~]# sed -n '/linux/p' linux.txt 
l teach linux

b.删除
1.删除含有Linux的行
[root@wb ~]# sed  '/linux/d' linux.txt 
l like badminton ball , billiard ball and chinese chess!
llllll
l
llll

llllll

2.把Linux替换成Windows
[root@wb ~]# sed 's#linux#windows#g' linux.txt 
l teach windows
l like badminton ball , billiard ball and chinese chess!
llllll
l
llll

llllll

[root@wb ~]# sed -i 's#linux#windows#g' linux.txt 
[root@wb ~]# cat linux.txt 
l teach windows
l like badminton ball , billiard ball and chinese chess!
llllll
l
llll

llllll

'g' 选项不加g是只替换每行匹配的第一个
[root@wb ~]# sed  's#linux#windows#g' linux.txt 
l teach windows
l like badminton windows , billiard ball and chinese chess!windows
llllll
l windows windows windows
llll windows windows
windows
llllll
[root@wb ~]# sed  's#linux#windows#' linux.txt 
l teach windows
l like badminton windows , billiard ball and chinese chess!linux
llllll
l windows linux linux
llll windows linux
windows
llllll

-e选项

[root@wb ~]# sed -e 's#linux#windows#g' -e 's#llll#aaaa#g' linux.txt 
l teach windows
l like badminton windows , billiard ball and chinese chess!windows
aaaall
l windows windows windows
aaaa windows windows
windows
aaaall


3.第二行后面追加内容
[root@wb ~]# sed '2a hello' linux.txt 
l teach linux
l like badminton linux , billiard ball and chinese chess!linux
hello
llllll
l linux linux linux
llll linux linux
linux
llllll

4.第二行前面追加内容
[root@wb ~]# sed '2i hello' linux.txt 
l teach linux
hello
l like badminton linux , billiard ball and chinese chess!linux
hello
llllll
l linux linux linux
llll linux linux
linux
llllll


5.删除单行跟删除多行
[root@wb ~]# cat -n  linux.txt 
     1	l teach linux
     2	l like badminton linux , billiard ball and chinese chess!linux
     3	llllll
     4	l linux linux linux
     5	llll linux linux
     6	linux
     7	llllll
[root@wb ~]# sed  '3,5d' linux.txt 
l teach linux
l like badminton linux , billiard ball and chinese chess!linux
linux
llllll
[root@wb ~]# sed  '5d' linux.txt 
l teach linux
l like badminton linux , billiard ball and chinese chess!linux
llllll
l linux linux linux
linux
llllll

1.2.1、实战–取网卡地址

[root@wb ~]# ip a | sed -n '9p' | sed 's#^.*inet ##g'| sed 's#/24.*$##g'
192.168.92.7

[root@wb ~]# ip a | sed -n '9p' | sed -e's#^.*inet ##g' -e's#/24.*$##g'
192.168.92.7

[root@wb ~]# ip a | sed -rn '9s#^.*inet (.*)/24.*$#\1#gp'
192.168.92.7

1.2.2、实战–取文件权限

[root@wb ~]# stat /etc/hosts
  File: ‘/etc/hosts’
  Size: 158       	Blocks: 8          IO Block: 4096   regular file
Device: fd00h/64768d	Inode: 16778185    Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:net_conf_t:s0
Access: 2020-12-25 09:50:51.949106205 +0800
Modify: 2013-06-07 22:31:32.000000000 +0800
Change: 2020-12-25 09:49:45.889110410 +0800
 Birth: -

[root@wb ~]# stat /etc/hosts | sed -n '4p'|sed 's#^.*(0##g'| sed 's#/-.*$##g'
644

[root@wb ~]# stat /etc/hosts | sed -rn '4s#^.*\(0(.*)\/-.*$#\1#gp'
644

1.3、老大 awk

awk在处理列方面非常的好用

awk是linux系统运维中最重要的工具，awk有非常强大的功能，awk不仅仅是一个文件处理命令，还是一种编程语言，可以实现非常强大的功能，awk的高级用法可以参考专门写awk的书籍
书籍链接

https://item.jd.com/10056195962440.html

awk取列
awk [选项] [条件 {动作}] 文件
	-F 指定分隔符,默认空格作为分隔符,最好把分隔符放双引号里,这样看起来比较清晰

条件
$1    第一列
$0    整行
$NF   最后一列
$(NF-n) 倒数第n+1列,比如数字n是1,那么就表示倒数第二列，以此类推
NR    行号
~     匹配如果只想在某一列中进行查找，那么就在某一列的后面加～号，然后在加要查找的内容就行了

动作
print 打印输出


取第三列
[root@wb ~]# awk -F ":" '{print $3}' test.txt 
1
2
3
4
5

取第三列和第五列
[root@wb ~]# awk -F ":" '{print $3,$5}' test.txt 
1 bin
2 daemon
3 adm
4 lp
5 sync

取最后一列
[root@wb ~]# awk -F ":" '{print $NF}' test.txt 
/sbin/nologin
/sbin/nologin
/sbin/nologin
/sbin/nologin
/bin/sync

取倒数第二列
[root@wb ~]# awk -F ":" '{print $(NF-1)}' test.txt 
/bin
/sbin
/var/adm
/var/spool/lpd
/sbin

取行
[root@test ~]# awk 'NR==3' passwd 
adm:x:3:4:adm:/var/adm:/sbin/nologin
如果要取不连续的多行，就用分号隔开
awk 'NR==3;NR==5' passwd 
表示取第三行，第五行
如果要取的是连续的多行，那么就使用下面的方式
awk 'NR>2&&NR<7' passwd
表示取第三行到第六行
也可以写成
awk 'NR==3,NR==6' passwd


取第二行的第三列
[root@test ~]# awk -F ":"  'NR==2{print $3}' passwd 
3


查找，需要把要查找的内容使用俩个/符号包含起来
[root@wb ~]# awk '/lp/' test.txt 
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
[root@wb ~]# awk '/nologin/' test.txt 
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin

查找shutdown,并且打印最后一列
[root@test ~]# awk -F ":" '/shutdown/{print $NF}' passwd 
/sbin/shutdown

查找不是s开头的行
[root@wb ~]# awk '/^[^s]/' test.txt 
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin

如果只想在某一列中进行查找，那么就在某一列的后面加～号，然后在加要查找的内容就行了
比如
$2~/root/
表示在第二列中查找root
[root@wb ~]# awk -F ":" '$1~/sync/ {print $NF}' test.txt 
/bin/sync

1.3.1、实战–取网卡ip

[root@wb ~]# ip a | awk 'NR==9{print $2}'
192.168.92.7/24
[root@wb ~]# ip a | awk 'NR==9{print $2}'| awk -F "/" '{print $1}'
192.168.92.7

1.3.2、实战–打印分数大于70

[root@wb ~]# cat >> a.txt <<BBB
> z3 n 70
> ls l 80
> w2 l 75
> z6 n 100
> x8 l 50
> BBB
[root@wb ~]# cat a.txt 
z3 n 70
ls l 80
w2 l 75
z6 n 100
x8 l 50

[root@wb ~]# awk '$NF>70 {print $0}' a.txt 
ls l 80
w2 l 75
z6 n 100