- 对着大纲自己去回忆,train.py还是有很多可以学习的小细节
- 反思为什么官方代码写的那么漂亮?
- 别人写的都是黑丝长发大美女,为啥自己写的就差点意思
1. parse_opt函数
2. main函数
- main函数——打印关键词/安装环境
- main函数——是否进行断点训练
- main函数——是否分布式训练
- main函数——是否进化训练/遗传算法调参
3. train函数
- train函数——基本配置信息
- train函数——模型加载/断点训练
- train函数——冻结训练/冻结层设置
- train函数——图片大小/batchsize设置
- train函数——优化器选择 / 分组优化设置
- train函数——学习率/ema/归一化/单机多卡
- train函数——数据加载 / anchor调整
- train函数——训练配置/多尺度训练/热身训练
- train函数——训练结束/打印信息/保存结果
4. run函数
- 如果回忆起来还是觉得有点空,那么
- 有一篇博文写的自己感觉写的不错,推荐给大家
- 下面只是做一部分技术要点分享,其余的可以看官方代码
yolov5代码解读之train.py【训练模型】_yolo train-CSDN博客
- 冻结训练
- 通过对模型requires_grad 参数的限制来冻结
# Freeze
freeze = [f"model.{x}." for x in (freeze if len(freeze) > 1 else range(freeze[0]))] # layers to freeze
for k, v in model.named_parameters():
v.requires_grad = True # train all layers
# v.register_hook(lambda x: torch.nan_to_num(x)) # NaN to 0 (commented for erratic training results)
if any(x in k for x in freeze):
LOGGER.info(f"freezing {k}")
v.requires_grad = False
复制
- 分层使用梯度求解器,并设置学习率和动量
- 主要用途是在训练过程中为不同类型的模型参数设置不同的学习率和权重衰减,从而提高模型的性能。
def smart_optimizer(model, name="Adam", lr=0.001, momentum=0.9, decay=1e-5):
"""Initializes a smart optimizer for YOLOv3 with custom parameter groups for different weight decays and biases."""
g = [], [], [] # optimizer parameter groups
bn = tuple(v for k, v in nn.__dict__.items() if "Norm" in k) # normalization layers, i.e. BatchNorm2d()
for v in model.modules():
for p_name, p in v.named_parameters(recurse=0):
if p_name == "bias": # bias (no decay)
g[2].append(p)
elif p_name == "weight" and isinstance(v, bn): # weight (no decay)
g[1].append(p)
else:
g[0].append(p) # weight (with decay)
if name == "Adam":
optimizer = torch.optim.Adam(g[2], lr=lr, betas=(momentum, 0.999)) # adjust beta1 to momentum
elif name == "AdamW":
optimizer = torch.optim.AdamW(g[2], lr=lr, betas=(momentum, 0.999), weight_decay=0.0)
elif name == "RMSProp":
optimizer = torch.optim.RMSprop(g[2], lr=lr, momentum=momentum)
elif name == "SGD":
optimizer = torch.optim.SGD(g[2], lr=lr, momentum=momentum, nesterov=True)
else:
raise NotImplementedError(f"Optimizer {name} not implemented.")
optimizer.add_param_group({"params": g[0], "weight_decay": decay}) # add g0 with weight_decay
optimizer.add_param_group({"params": g[1], "weight_decay": 0.0}) # add g1 (BatchNorm2d weights)
LOGGER.info(
f"{colorstr('optimizer:')} {type(optimizer).__name__}(lr={lr}) with parameter groups "
f'{len(g[1])} weight(decay=0.0), {len(g[0])} weight(decay={decay}), {len(g[2])} bias'
)
return optimizer
复制
- 创建一个学习率调度器
- 可以根据训练过程中的训练轮数来调整学习率
# Scheduler
if opt.cos_lr:
lf = one_cycle(1, hyp["lrf"], epochs) # cosine 1->hyp['lrf']
else:
lf = lambda x: (1 - x / epochs) * (1.0 - hyp["lrf"]) + hyp["lrf"] # linear
scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lf) # plot_lr_scheduler(optimizer, scheduler, epochs)
复制
- 在深度学习中,经常会使用EMA(指数移动平均)这个方法对模型的参数做平均,以求提高测试指标并增加模型鲁棒。
- EMA对第i步的梯度下降的步长增加了权重系数1- α^{n-1},相当于做了一个earning rate decay.
ema = ModelEMA(model) if RANK in {-1, 0} else None
复制
- EarlyStopping
- 在初始化EarlyStopping类时,需要设置一个patience参数,表示在多少个epoch内如果没有改进,就停止训练。默认情况下,patience设置为30。
class EarlyStopping:
# YOLOv3 simple early stopper
def __init__(self, patience=30):
"""Initializes EarlyStopping to monitor training, halting if no improvement in 'patience' epochs, defaulting to
30.
"""
self.best_fitness = 0.0 # i.e. mAP
self.best_epoch = 0
self.patience = patience or float("inf") # epochs to wait after fitness stops improving to stop
self.possible_stop = False # possible stop may occur next epoch
def __call__(self, epoch, fitness):
"""Updates stopping criteria based on fitness; returns True to stop if no improvement in 'patience' epochs."""
if fitness >= self.best_fitness: # >= 0 to allow for early zero-fitness stage of training
self.best_epoch = epoch
self.best_fitness = fitness
delta = epoch - self.best_epoch # epochs without improvement
self.possible_stop = delta >= (self.patience - 1) # possible stop may occur next epoch
stop = delta >= self.patience # stop training if patience exceeded
if stop:
LOGGER.info(
f"Stopping training early as no improvement observed in last {self.patience} epochs. "
f"Best results observed at epoch {self.best_epoch}, best model saved as best.pt.\n"
f"To update EarlyStopping(patience={self.patience}) pass a new patience value, "
f"i.e. `python train.py --patience 300` or use `--patience 0` to disable EarlyStopping."
)
return stop
复制
- Warmup
- 如果当前的批次正好处于需要进行warmup的前几批数据时,它会遍历优化器中的所有参数组。参数组会把模型的网络结构中的所有参数分成3组,针对所有的bias参数,学习率会在热身时从0.1逐渐降低到初始学习率lr0,所有其他参数的学习率会从0.0逐渐升高到lr0
if ni <= nw:
xi = [0, nw] # x interp
# compute_loss.gr = np.interp(ni, xi, [0.0, 1.0]) # iou loss ratio (obj_loss = 1.0 or iou)
accumulate = max(1, np.interp(ni, xi, [1, nbs / batch_size]).round())
for j, x in enumerate(optimizer.param_groups):
# bias lr falls from 0.1 to lr0, all other lrs rise from 0.0 to lr0
x["lr"] = np.interp(ni, xi, [hyp["warmup_bias_lr"] if j == 0 else 0.0, x["initial_lr"] * lf(epoch)])
if "momentum" in x:
x["momentum"] = np.interp(ni, xi, [hyp["warmup_momentum"], hyp["momentum"]])
复制
- 其他的细节也不添加了,大家自己看源码,yolov5的代码跟yolov3几乎一模一样
- 如果看到大纲都胸有成竹,那恭喜你,希望举一反三,再创辉煌