两种情况:
1.如果上次的指定训练轮次(epochs)没跑完就意外中断了,直接在在命令行中加上 --resume + 上一次训练中last.pt的路径
例如:
python train.py --resume runs/exp2/weights/last.pt
也可以只写--resume后面不跟路径参数,程序会自动找到runs目录下所有exp/weights目录下的文件修改时间最晚的last.pt
python train.py --resume
而其他所有参数(--epochs、--batch-size等等)都会自动读取跟上次一样的(保存在runs/exp2/opt.yaml中),所以不用指定,就算指定了也没用。
2.如果是上次的指定训练轮次(epochs)跑完了还想要继续训练
先打开runs/exp2/opt.yaml,把epochs改成总轮次数(上次跑完的epochs+想继续跑的epochs),比如上次跑完了200epochs,想再加100epochs,那就改成300
在train.py中找到这部分,大约在190行,
把
start_epoch = ckpt['epoch'] + 1
改成
start_epoch = 上次跑完的轮次数
# Epochs
# start_epoch = ckpt['epoch'] + 1
start_epoch = 200
if opt.resume:
assert start_epoch > 0, '%s training to %g epochs is finished, nothing to resume.' % (weights, epochs)
shutil.copytree(wdir, wdir.parent / f'weights_backup_epoch{start_epoch - 1}')
if epochs < start_epoch:
logger.info('%s has been trained for %g epochs. Fine-tuning for %g additional epochs.' %
(weights, ckpt['epoch'], epochs))
epochs += ckpt['epoch'] # finetune additional epochs
del ckpt, state_dict
然后像情况1一样 --resume
python train.py --resume runs/exp2/weights/last.pt