Bash 文本处理

文本处理

4.8.1. iconv - Convert encoding of given files from one encoding to another
cconv - A iconv based simplified-traditional chinese conversion tool

cconv是建立在iconv之上,可以UTF8编码直接转换,并增加了词转换。

sudo apt-get install cconv
			

使用cconv进行简繁转换的方法为:

cconv -f UTF8-CN -t UTF8-HK zh-cn.txt -o zh-hk.txt
			
uconv - convert data from one encoding to another

安装

sudo apt-get install libicu-dev
			

例子

$ uconv -f cp1252 -t UTF-8 -o file_in_utf8.txt file_in_cp1252_encoding.txt
			
4.8.2. 字符串处理命令expr
		
字符串处理命令expr用法简介:
名称:expr
用途:求表达式变量的值。
语法: expr Expression
实例如下:
例子1:字串长度
shell>> expr length "this is a test content";
22
例子2:求余数
shell>> expr 20 % 9
2
例子3:从指定位置处截取字符串
shell>> expr substr "this is a test content" 3 5
is is
例子4:指定字符串第一次出现的位置
shell>> expr index "testforthegame" s
3
例子5:字符串真实重现
shell>> expr quote thisisatestformela
thisisatestformela
		
		
4.8.3. cat - concatenate files and print on the standard output
-b	不对空白行编号。
-e	使用 $ 字符显示行尾。
-n	从 1 开始对所有输出行编号。
-q	使用静默操作(禁止错误消息)。
-r	将所有多个空行替换为单行(“压缩”空白)。
-t	将制表符显示为 ^I。
-u	不对输出进行缓冲。
-v	可视地显示非打印控制字符。
		
-s, --squeeze-blank suppress repeated empty output lines

-S 将多个空白行压缩到单行中(与 -r 相同)

			
$ cat >> /tmp/test <<EOF
Line1

Line2


Line3




Line4


Line5

EOF

$ cat -s /tmp/test
Line1

Line2

Line3

Line4

Line5

			
			
-v, --show-nonprinting use ^ and M- notation, except for LFD and TAB

显示控制字符。例如Tab等,下面例子查看文件结尾换行符类型

			
[neo@netkiller ~]# cat -v file.txt
GRANT USAGE ON *.* TO 'esauser'@'localhost' IDENTIFIED BY xxxxxxx; ^M
^M
file^M
2059^M
			
			
与管道配合使用
			
[log@logging tmp]$ cat <<EOF | grep 'm'
İsmail
Ahmet
Ali
Elif
Mehmet
EOF
İsmail
Ahmet
Mehmet			
			
			

多管道

			
cat <<EOF | grep 'm' | tee matched_names.txt
İsmail
Ahmet
Ali
Elif
Mehmet
EOF			
			
			
4.8.4. nl - number lines of files
$ nl /etc/issue
     1  CentOS release 5.4 (Final)
     2  Kernel \r on an \m
		
4.8.5. tr - translate or delete characters
		
[:alnum:] :所有字母字符与数字
[:alpha:] :所有字母字符
[:blank:] :所有水平空格
[:cntrl:] :所有控制字符
[:digit:] :所有数字
[:graph:] :所有可打印的字符(不包含空格符)
[:lower:] :所有小写字母
[:print:] :所有可打印的字符(包含空格符)
[:punct:] :所有标点字符
[:space:] :所有水平与垂直空格符
[:upper:] :所有大写字母
[:xdigit:] :所有 16 进位制的数字		
		
		
替换字符

":"替换为"\n"

			
$ cat /etc/passwd |tr ":" "\n"
			
			
			
[root@gitlab ~]# echo "/opt/netkiller.cn/www.netkiller.cn" | tr -- '/.' ':-'
:opt:netkiller-cn:www-netkiller-cn			
			
			
英文大小写转换

使用 tr '[:lower:]' '[:upper:]' 将小写字母替换成大写字母

			
[root@localhost ~]# echo "Helloworld" | tr '[:lower:]' '[:upper:]'
HELLOWORLD			
			
			

替换整段文字

						
[root@localhost ~]# cat /etc/redhat-release 
CentOS Linux release 7.5.1804 (Core) 

[root@localhost ~]# cat /etc/redhat-release | tr '[:lower:]' '[:upper:]'
CENTOS LINUX RELEASE 7.5.1804 (CORE) 
			
			

			
[root@localhost ~]# echo "Netkiller" |  tr ‘a-z’ ‘A-Z’
NETKILLER			
			
			
			
neo@MacBook-Pro-M2 ~> uuidgen
71386AEE-C468-44E1-A0A3-FB4EBB4600AA

neo@MacBook-Pro-M2 ~> uuidgen | tr [:upper:] [:lower:]
3d807c48-ef5f-4297-869f-120cb713f752			
			
			
[CHAR*] 和 [CHAR*REPEAT]

			
[root@localhost ~]# echo "1234567890" | tr ‘1-5′ ‘[A*]‘ 
AAAAA67890

[root@localhost ~]# echo "1234567890" | tr ‘1-9′ ‘[A*5]BCDE’
AAAAABCDE0			
			
			
-s, --squeeze-repeats replace each input sequence of a repeated character that is listed in SET1 with a single occurrence of that character

删除重复的字符

			
[root@localhost ~]# echo "My      nickname  is      netkiller." | tr -s ' '
My nickname is netkiller.	

[root@localhost ~]# echo "aaaabbbbccccdddd." | tr -s 'a'
abbbbccccdddd.

[root@localhost ~]# echo "aaaabbbbccccdddd." | tr -s 'a-z'
abcd.		
			
			
-d, --delete delete characters in SET1, do not translate

删除字符

			
[root@localhost ~]# echo "My nickname is netkiller" | tr -d ' '
Mynicknameisnetkiller
			
[root@localhost ~]# md5sum /etc/issue | tr -d [0-9] 
ffedfcfbdcaebdec  /etc/issue			
			
			

删除控制字符

			 
[root@netkiller ~]# cat file | tr -d [:cntrl:]
			
			
4.8.6. cut - remove sections from each line of files

列操作

$ last | grep  'neo' | cut -d ' ' -f1
        
$ cat /etc/passwd | cut -d ':' -f1
root
daemon
bin
sys
sync
games
man
lp
mail
news
uucp
proxy

$ cat /etc/passwd | cut -d ':' -f1,3,4

# cat /etc/passwd | cut -d ':' -f1,6
root:/root
bin:/bin
daemon:/sbin
adm:/var/adm
lp:/var/spool/lpd
sync:/sbin
shutdown:/sbin
halt:/sbin
mail:/var/spool/mail
uucp:/var/spool/uucp
operator:/root
games:/usr/games
gopher:/var/gopher
ftp:/var/ftp
nobody:/
vcsa:/dev
saslauth:/var/empty/saslauth
postfix:/var/spool/postfix
sshd:/var/empty/sshd
rpc:/var/cache/rpcbind
rpcuser:/var/lib/nfs
nfsnobody:/var/lib/nfs
ntp:/etc/ntp
nagios:/var/log/nagios

        

行操作

$ cat /etc/passwd | cut -c 1-4
root
daem
bin:
sys:
sync
game
man:

$ echo "No such file or directory"| cut -c4-7
such

$ echo "No such file or directory"| cut -c -8
No such

$ echo "No such file or directory"| cut -c-8
No such

        
4.8.7. printf - format and print data
printf "%d\n" 1234
		
$ printf "\033[1;33m TEST COLOR \n\033[m"
		
4.8.8. Free `recode' converts files between various character sets and surfaces.

Following will convert text files between DOS, Mac, and Unix line ending styles:

		
$ recode /cl../cr <dos.txt >mac.txt
$ recode /cr.. <mac.txt >unix.txt
$ recode ../cl <unix.txt >dos.txt
		
		
4.8.9. /dev/urandom 随机字符串
		
[neo@test .deploy]$ echo `< /dev/urandom tr -dc A-Z-a-z-0-9 | head -c 8`
GidAuuNN
[neo@test .deploy]$ echo `< /dev/urandom tr -dc A-Z-a-z-0-9 | head -c 8`
UyGaWSKr
		
		

我常常使用这样的随机字符初始化密码

		
[neo@test .deploy]$ echo `< /dev/urandom tr -dc [:alnum:] | head -c 8`
xig8Meym
[neo@test .deploy]$ echo `< /dev/urandom tr -dc [:alnum:] | head -c 8`
23Ac1vZg
[neo@test .deploy]$ echo `< /dev/urandom tr -dc [:digit:] | head -c 8`
73652314
[neo@test .deploy]$ echo `< /dev/urandom tr -dc [:graph:] | head -c 8`
GO_o>OnJ
[neo@test .deploy]$ echo `< /dev/urandom tr -dc [:graph:] | head -c 10`
iGy0FS/aO5
[neo@test .deploy]$ echo `< /dev/urandom tr -dc [:graph:] | head -c 50`
;`E^{5(T4v~5$YovW.?%_?9la<`+qPcRh@7mD\!Whx;MJZVQ\K
[neo@test .deploy]$ echo `< /dev/urandom tr -dc [:print:] | head -c 50`
fy$[#:'(')jt'gp1/g-)d~p]8 :r9i;MO2d!8M<?Qs3t:QgK$O
[neo@test .deploy]$ echo `< /dev/urandom tr -dc [:graph:] | head -c 50`
6SivJ5y$/FTi8mf}rrqE&s0"WkA}r;uK-=MT!Wp0UlL_lF0|bL
		
		

批量生成

		
for i in {1..10}
do
echo `< /dev/urandom tr -dc A-Z-a-z-0-9 | head -c 8`
done
		
		
		
# cat /dev/urandom | tr -cd [:alnum:] | fold -w30 | head -n 20
AVqROzjF6ZATJGv2J6PzDHp3jLpKV4
ONt68UFNDwgXpSnLBV7oRDX3VLRYsX
EZTWCGvZc3mIEeuw9sxMtV8ZkzVRJv
BhUiv0a7utsjZFLYpKGZrY5aDXcZL4
5YfUl2hmDT1O9X61DRYg4wSp4lXoXX
ykyPJxH47PzxnNGlujIUF98ZtB01H0
QyP53mksQN8bCNNo1fSD3RtqhhEGfa
u2RkT1M9GUQF4a6O18tG5WD97OOXze
Whm5X7398Q8L9BONN8k2oLy9CL37JO
TmGQz7WB6WnkjhyB4wrBHBJ3HMIRyf
hww43yvddUDYUnbNOKjhv3sLhCA4YD
uY6zQtBC6miwLUl3jkCVVA0Xu8ASgj
jv58qu46VW7LvRIq4txNE8bG9NBlZl
pzaMkydAiCHCF5H2oQVqMn4DTTYgNL
yoN2A9LyrCwLfjP1ad9HMAwxExJL5i
J27iy2L90m9dpcPLJ8tl46GGb9xqmQ
6YwFCvuPHyyEwnctUTpqLFcvUafVZ2
Nuq9XgIgRQGynjlVqGLMOpO0MkGpsn
tChkRG7eoRuKVXgW7ccTGx45E54K3Y
qPv48XqdGlOrdULCOGZ45kwJ1v5kVX		
		
		
4.8.10. col - filter reverse line feeds from input

清除 ^M 字符

$ cat oldfile | col -b > newfile
		
4.8.11. apg - generates several random passwords
sudo apt-get install apg

$ apg

Please enter some random data (only first 16 are significant)
(eg. your old password):>
imlogNukcel5 (im-log-Nuk-cel-FIVE)
Drocdaf1 (Droc-daf-ONE)
fagJook0 (fag-Jook-ZERO)
heabugJer4 (heab-ug-Jer-FOUR)
5OsEsudy (FIVE-Os-Es-ud-y)
IrjOgneagOc9 (Irj-Og-neag-Oc-NINE)


$ apg -M SNCL -m 16
WoidWemFut6dryn,
byRowpEus-Flutt0
|QuogCagFaycsic0
ojHoadCyct4Freg_
Vir9blir`orhohoo
bapOip?Ibreawov2
		
4.8.12. head/tail
head -c 17 | tail -c 1
		
彩色输出
			
printf "%s" $(printf '\033[0;31m'); tail /etc/passwd		
			
			

			
tail -f example.log | sed \
-e "s/FATAL/"$'\e[31m'"&"$'\e[m'"/" \
-e "s/ERROR/"$'\e[31m'"&"$'\e[m'"/" \
-e "s/WARNING/"$'\e[33m'"&"$'\e[m'"/" \
-e "s/INFO/"$'\e[32m'"&"$'\e[m'"/" \
-e "s/DEBUG/"$'\e[34m'"&"$'\e[m'"/"			
			
			
跳过 n 行,输出后面内容

首先看看源文件内容

			
[root@netkiller ~]# head -n 5 /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin			
			
			

现在跳过第一行,显示后面所有内容

			
[root@netkiller ~]# tail -n +2 /etc/passwd
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:65534:65534:Kernel Overflow User:/:/sbin/nologin
systemd-coredump:x:999:996:systemd Core Dumper:/:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin
tss:x:59:59:Account used for TPM access:/dev/null:/sbin/nologin
polkitd:x:998:995:User for polkitd:/:/sbin/nologin
sssd:x:997:994:User for sssd:/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/usr/share/empty.sshd:/sbin/
  • 12
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

netkiller-

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值