听人说做文本分类时处理100G的文本文件,居然不用大数据,处理方法就是用shell的split去分割成若干小文件。
split命令
NAME
split - split a file into pieces
SYNOPSIS
split [OPTION] [INPUT [PREFIX]]
DESCRIPTION
Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, ...; default size is 1000 lines, and default PREFIX is
‘x’. With no INPUT, or when INPUT is -, read standard input.
Mandatory arguments to long options are mandatory for short options too.
-a, --suffix-length=N
use suffixes of length N (default 2)
-b, --bytes=SIZE
put SIZE bytes per output file
-C, --line-bytes=SIZE
put at most SIZE bytes of lines per output file
-d, --numeric-suffixes
use numeric suffixes instead of alphabetic
-l, --lines=NUMBER
put NUMBER lines per output file
--verbose
print a diagnos