报错:
File "/checkpoint/binary/train_package/megatron/training/arguments.py", line 243, in validate_args assert args.micro_batch_size is not None
但是发现参数有
args.micro_batch_size = 1
解决:
集群比较特殊,在data config传参的时候,不能换行。。。。需要改成空格。。。
比如:
ratio1 path1 ratio2 path2
DATA_CONFIG_PATH=$CODE_PATH/Pai-Megatron-Patch/data_ratios/CT_qwen14B_17lan_dclm.txt