linux截取文本内容方法_linux 从文档中提取前100行数据

功勋Web工程师

于 2024-05-16 05:18:19 发布

阅读量732

点赞数 16

文章标签：运维 linux 面试

本文链接：https://blog.csdn.net/m0_61549781/article/details/138937308

版权

先自我介绍一下，小编浙江大学毕业，去过华为、字节跳动等大厂，目前在阿里

深知大多数程序员，想要提升技能，往往是自己摸索成长，但自己不成体系的自学效果低效又漫长，而且极易碰到天花板技术停滞不前！

因此收集整理了一份《2024年最新Linux运维全套学习资料》，初衷也很简单，就是希望能够帮助到想自学提升又不知道该从何学起的朋友。

既有适合小白学习的零基础资料，也有适合3年以上经验的小伙伴深入学习提升的进阶课程，涵盖了95%以上运维知识点，真正体系化！

由于文件比较多，这里只是将部分目录截图出来，全套包含大厂面经、学习笔记、源码讲义、实战项目、大纲路线、讲解视频，并且后续会持续更新

需要这份系统化的资料的朋友，可以点击这里获取！

-u, --unbuffered immediately copy input to output with ‘-n r/…’
–verbose print a diagnostic just before each
output file is opened
–help display this help and exit
–version output version information and exit

The SIZE argument is an integer and optional unit (example: 10K is 10*1024).
Units are K,M,G,T,P,E,Z,Y (powers of 1024) or KB,MB,… (powers of 1000).
Binary prefixes can be used, too: KiB=K, MiB=M, and so on.

CHUNKS may be:
N split into N files based on size of input
K/N output Kth of N to stdout
l/N split into N files without splitting lines/records
l/K/N output Kth of N to stdout without splitting lines/records
r/N like ‘l’ but use round robin distribution
r/K/N likewise but only output Kth of N to stdout

GNU coreutils online help: https://www.gnu.org/software/coreutils/
Full documentation https://www.gnu.org/software/coreutils/split
or available locally via: info ‘(coreutils) split invocation’

#按照大小分个指定文件

-b 指定要分割成的文件大小， renamefile为分割后文件的前缀，后面一般接xaa等顺序的字母编号

split -b 1M file.txt renamefile

#按行数将文件分割成多个文件，

-l 指定行数，renamefile为分割后文件的前缀

split -l 100 file.txt renamefile


###  5、sed分割文件


sed的功能很强大，不仅可以提取文件，更重要的是对文件进行具体内容的操作，如插入，替换等。下面是默认的系统帮助文件，仅作参考

Usage: sed [OPTION]… {script-only-if-no-other-script} [input-file]…

-n, --quiet, --silent
suppress automatic printing of pattern space
–debug
annotate program execution
-e script, --expression=script
add the script to the commands to be executed
-f script-file, --file=script-file
add the contents of script-file to the commands to be executed
–follow-symlinks
follow symlinks when processing in place
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if SUFFIX supplied)
-c, --copy
use copy instead of rename when shuffling files in -i mode
-b, --binary
does nothing; for compatibility with WIN32/CYGWIN/MSDOS/EMX
(open files in binary mode; CR+LF are not processed specially)
-l N, --line-length=N
specify the desired line-wrap length for the `l’ command
–posix
disable all GNU extensions.
-E, -r, --regexp-extended
use extended regular expressions in the script
(for portability use POSIX -E).
-s, --separate
consider files as separate rather than as a single,
continuous long stream.
–sandbox
operate in sandbox mode (disable e/r/w commands).
-u, --unbuffered
load minimal amounts of data from the input files and flush
the output buffers more often
-z, --null-data
separate lines by NUL characters
–help display this help and exit
–version output version information and exit

If no -e, --expression, -f, or --file option is given, then the first
non-option argument is taken as the sed script to interpret. All
remaining arguments are names of input files; if no input files are
specified, then the standard input is read.

GNU sed home page: https://www.gnu.org/software/sed/.
General help using GNU software: https://www.gnu.org/gethelp/.
E-mail bug reports to: bug-sed@gnu.org.


分割文件操作示例：

###提取指定行，打印或输出到指定文件
#打印第12行
sed -n ‘12p’ file.txt

[root@vmgmt ~]# sed -n ‘12p’ file.txt
12

#获取第12行内容到新文件，后面需要输入到文件的直接使用“>” 或“>>” 符号接新文件名，
sed -n ‘12p’ file.txt >newfile.txt

#获取文件的最后一行
sed -n ‘$p’ file.txt

##获取指定多行
sed -n -e ‘2p’ -e ‘5p’ file.txt

[root@vmgmt ~]# sed -n -e ‘2p’ -e ‘5p’ file.txt
2
5

##获取file文件的第10行到15行
sed -n ‘10,+5p’ file.txt

[root@vmgmt ~]# sed -n ‘10,+5p’ file.txt
10
11
12
13
14
15

#获取前面1到5行
sed -e ‘5q’ file.txt

[root@vmgmt ~]# sed -e ‘5q’ file.txt
1
2
3
4
5

#获取file中的偶数行
sed -n ‘n;p’ file.txt

[root@vmgmt ~]# sed -n ‘n;p’ file.txt
2
4
6
8
10
12
14
16
18
20
22
24

#获取file中的奇数行
sed -n ‘p;n’ file.txt

[root@vmgmt ~]# sed -n ‘p;n’ file.txt
1
3
5
7
9
11
13
15
17
19
21
23
25

###############正则表达式操作， / / ,反斜杠之间填写正则规则，具体请参考正则表达式的填写方式。
#获取文件中包含 2 字符串的所有行
sed -n ‘/2/p’ file.txt

[root@vmgmt ~]# sed -n ‘/2/p’ file.txt
2
12
20
21
22
23
24
25

#获取以字符1为开头的所有行
sed -n ‘/^1/p’ file.txt

[root@vmgmt ~]# sed -n ‘/^1/p’ file.txt
1
10
11
12
13
14
15
16
17
18
19

#获取以字符1为结尾的所有行
sed -n ‘/1$/p’ file.txt

[root@vmgmt ~]# sed -n ‘/1$/p’ file.txt
1
11
21

#开头结尾组合输出以字符1开头，或以字符3结尾的行
sed -n ‘/^2|3$/p’ file.txt

[root@vmgmt ~]# sed -rn ‘/^2|3$/p’ file.txt
2
3
13
20
21
22
23
24
25

#获取指定要排除或删除的行后的文件内容
#删除2-6行以外的其他内容
sed ‘2,6!d’ file.txt

[root@vmgmt ~]# sed ‘2,6!d’ file.txt
2
3
4
5
6

#获取除2-6行以外的其他内容
sed ‘2,6d’ file.txt

[root@vmgmt ~]# sed ‘2,6d’ file.txt
1
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

#删除文件中的空行
sed ‘/^&/d’ file.txt

#正则表达式方式删除
#删除包含1字符的行
sed ‘/1/d’ file.txt

[root@vmgmt ~]# sed ‘/1/d’ file.txt
2
3
4
5
6
7
8
9
20
22
23
24
25


### 6、cut命令分割文件


参考帮助提示：

[root@vmgmt ~]# cut --help
Usage: cut OPTION… [FILE]…
Print selected parts of lines from each FILE to standard output.

With no FILE, or when FILE is -, read standard input.

Mandatory arguments to long options are mandatory for short options too.
-b, --bytes=LIST select only these bytes
-c, --characters=LIST select only these characters
-d, --delimiter=DELIM use DELIM instead of TAB for field delimiter
-f, --fields=LIST select only these fields; also print any line
that contains no delimiter character, unless
the -s option is specified
-n with -b: don’t split multibyte characters
–complement complement the set of selected bytes, characters
or fields
-s, --only-delimited do not print lines not containing delimiters
–output-delimiter=STRING use STRING as the output delimiter
the default is to use the input delimiter
-z, --zero-terminated line delimiter is NUL, not newline
–help display this help and exit
–version output version information and exit

Use one, and only one of -b, -c or -f. Each LIST is made up of one
range, or many ranges separated by commas. Selected input is written
in the same order that it is read, and is written exactly once.
Each range is one of:

N N’th byte, character or field, counted from 1
N- from N’th byte, character or field, to end of line
N-M from N’th to M’th (included) byte, character or field
-M from first to M’th (included) byte, character or field

GNU coreutils online help: https://www.gnu.org/software/coreutils/
Full documentation https://www.gnu.org/software/coreutils/cut
or available locally via: info ‘(coreutils) cut invocation’