Linux重修笔记------cut

最新推荐文章于 2019-05-24 22:36:13 发布

crazy0x90

最新推荐文章于 2019-05-24 22:36:13 发布

阅读量843

点赞数

分类专栏： linux 文章标签： linux

本文链接：https://blog.csdn.net/notasheep/article/details/12835683

版权

linux 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

环境debian7 weekly

算是那个实验1的一个狗尾续貂

之前看到个叫做linux大棚的博客，好东西~~

Cut命令

-b (byte)字节

-c (characters)字符

-d 自定义分割符号，默认为转义字符

-f (fields)域

-n 取消分割多字节字符，与b一起用的

root@debian:~# ps
  PID TTY          TIME CMD
 4062 pts/0    00:00:00 bash
 4110 pts/0    00:00:00 ps
 
$ ps|cut -b 3
root@debian:~# ps|cut -b 3
P
0
1
1

这里第三个字符是P！！

P.S.对于字符数是怎么数的，可不是输出时候输出第一个你看得到的为准，r这个列的位置才是第一个字符

$ ps |cut -b 3-5,7
root@debian:~# ps|cut -b 3-5,7
PIDT
062p
113p
114p
 
$ ps |cut -b 7,3-5
root@debian:~# ps|cut -b 7,3-5
PIDT
062p
117p
118p
 
$ ps |cut -b 7,3-5	 //出错了
 
$ ps |cut -b 3-
PID TTY          TIME CMD
062 pts/0    00:00:00 bash
123 pts/0    00:00:00 ps
124 pts/0    00:00:00 cut
 
$ ps |cut -b -5
  PID
 4062
 4125
 4126
 
$ ps |cut -b -5,5- 
  PID TTY          TIME CMD
 4062 pts/0    00:00:00 bash
 4127 pts/0    00:00:00 ps
 4128 pts/0    00:00:00 cut

然后对中文

root@debian:/home/xyz# cat file_ch.txt
Google从2005年其就开始举办这样的全球性活动.
简单地用一句话概括一下，就是Google出钱给学生为开源项目写代码，而这个项目是在学生暑假期间举    行，被选择上并成功完成的学生最终能够获得5000美刀的奖金。
当       然，Google的这个项目最想得到的是：提供给学生机会参与到真实的软件开发中，在项目结束后能够有所收获，并且还能继续投入到开源中，为开源社区做贡献。
 
root@debian:/home/xyz# cat file_ch.txt|cut -b 1
G
�
�
 
root@debian:/home/xyz# cat file_ch.txt|cut -c 1
G
�
�
root@debian:/home/xyz# cat file_ch.txt|cut -c 2
o
�
�
root@debian:/home/xyz# cat file_ch.txt|cut -c 1,2
Go
�
�
root@debian:/home/xyz# cat file_ch.txt|cut -c 1,2,3
Goo
简
当
root@debian:/home/xyz# cat file_ch.txt|cut -c 3
o
�
�

对比发现c能够读取中文，那是因为c是读取字符，b是字节，汉字包含两个字节

n的用法

$ cat file_ch.txt | cut -nb 1,2,3

在debian7环境中与-c 几乎没差的样子，在centos有待检验

补充：在GB2312编码中一个汉字占2个字节，而在UTF-8中，一个汉字要占3个字节”

Debian7默认是utf-8，有些是gb的，怎么更改呢？百度了一下再chinaunix找到，可行

将~/.bashrc 如果有以下的export则改为
export LANG=zh_CN.UTF-8
export LC_CTYPE="zh_CN.UTF-8"
 
这样启动终端的时候就是选UTF-8的编码了，然后设置/etc/environment
[liang@localhost ~]$ cat /etc/environment
LANGUAGE="zh_CN:zh:en_US:en"
LC_ALL=zh_CN.UTF-8
 
LANG=zh_CN
[liang@localhost ~]$

-b -c一般只用来读取固定格式的文件，对于不固定格式的就用-d -f ，那如果f的域数字调转会怎样呢？

root@debian:/home/xyz# cat /etc/passwd |head -n5
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync

root@debian:~# cat /etc/passwd |head -n5| cut -d: -f1,4-5
root:0:root
daemon:1:daemon
bin:2:bin
sys:3:sys
sync:65534:sync

root@debian:~# cat /etc/passwd |head -n5| cut -d: -f4-5,1
root:0:root
daemon:1:daemon
bin:2:bin
sys:3:sys
sync:65534:sync

-f 后面的参数顺序不影响其输出排序

root@debian:~# cat /etc/passwd |head -n5| cut -d: -f 4-

0:root:/root:/bin/bash
1:daemon:/usr/sbin:/bin/sh
2:bin:/bin:/bin/sh
3:sys:/dev:/bin/sh
65534:sync:/bin:/bin/sync

root@debian:~# cat /etc/passwd |head -n5| cut -d: -f -4
root:x:0:0
daemon:x:1:1
bin:x:2:2
sys:x:3:3
sync:x:4:65534

这是包括第四个域的哦

如果用空格来进行分割，空格加上单引号

$ ls -l | cut -d’ ‘ -f 1

对于转义字符分割：

可以先用一个命令查看一下究竟用哪个

$ sed -n l file_ch.txt

第二个参数是小写 L 。中文会变成 utf 码，然后回车是 $, 而 TAB 是 \t

还有一点，cut的d不支持正则~~~~所以其最大缺点就是不能处理多个空格的分隔

crazy0x90

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录