一、bug
1、pre-tokenize的时候, 会OOM
解决:在yaml文件中添加streaming参数
# tokenize
streaming: True
max_steps: 10000
streaming: bool = field(
default=False,
metadata={"help": "Enable dataset streaming."},
max_steps: 10000
<
解决:在yaml文件中添加streaming参数
# tokenize
streaming: True
max_steps: 10000
streaming: bool = field(
default=False,
metadata={"help": "Enable dataset streaming."},
max_steps: 10000
<