学习笔记-yolov3代码解读之train.py

william_myq

于 2024-07-02 19:48:21 发布

阅读量373

点赞数 3

文章标签：学习笔记 YOLO

本文链接：https://blog.csdn.net/xiujiti6871/article/details/140135080

版权

对着大纲自己去回忆，train.py还是有很多可以学习的小细节
反思为什么官方代码写的那么漂亮？
别人写的都是黑丝长发大美女，为啥自己写的就差点意思

1. parse_opt函数

2. main函数

main函数——打印关键词/安装环境
main函数——是否进行断点训练
main函数——是否分布式训练
main函数——是否进化训练/遗传算法调参

3. train函数

train函数——基本配置信息
train函数——模型加载/断点训练
train函数——冻结训练/冻结层设置
train函数——图片大小/batchsize设置
train函数——优化器选择 / 分组优化设置
train函数——学习率/ema/归一化/单机多卡
train函数——数据加载 / anchor调整
train函数——训练配置/多尺度训练/热身训练
train函数——训练结束/打印信息/保存结果

4. run函数

如果回忆起来还是觉得有点空，那么
有一篇博文写的自己感觉写的不错，推荐给大家
下面只是做一部分技术要点分享，其余的可以看官方代码

yolov5代码解读之train.py【训练模型】_yolo train-CSDN博客

冻结训练
通过对模型requires_grad 参数的限制来冻结

# Freeze
freeze = [f"model.{x}." for x in (freeze if len(freeze) > 1 else range(freeze[0]))]  # layers to freeze
for k, v in model.named_parameters():
	v.requires_grad = True  # train all layers
	# v.register_hook(lambda x: torch.nan_to_num(x))  # NaN to 0 (commented for erratic training results)
	if any(x in k for x in freeze):
		LOGGER.info(f"freezing {k}")
		v.requires_grad = False复制

分层使用梯度求解器，并设置学习率和动量
主要用途是在训练过程中为不同类型的模型参数设置不同的学习率和权重衰减，从而提高模型的性能。

def smart_optimizer(model, name="Adam", lr=0.001, momentum=0.9, decay=1e-5):
    """Initializes a smart optimizer for YOLOv3 with custom parameter groups for different weight decays and biases."""
    g = [], [], []  # optimizer parameter groups
    bn = tuple(v for k, v in nn.__dict__.items() if "Norm" in k)  # normalization layers, i.e. BatchNorm2d()
    for v in model.modules():
        for p_name, p in v.named_parameters(recurse=0):
            if p_name == "bias":  # bias (no decay)
                g[2].append(p)
            elif p_name == "weight" and isinstance(v, bn):  # weight (no decay)
                g[1].append(p)
            else:
                g[0].append(p)  # weight (with decay)

    if name == "Adam":
        optimizer = torch.optim.Adam(g[2], lr=lr, betas=(momentum, 0.999))  # adjust beta1 to momentum
    elif name == "AdamW":
        optimizer = torch.optim.AdamW(g[2], lr=lr, betas=(momentum, 0.999), weight_decay=0.0)
    elif name == "RMSProp":
        optimizer = torch.optim.RMSprop(g[2], lr=lr, momentum=momentum)
    elif name == "SGD":
        optimizer = torch.optim.SGD(g[2], lr=lr, momentum=momentum, nesterov=True)
    else:
        raise NotImplementedError(f"Optimizer {name} not implemented.")

    optimizer.add_param_group({"params": g[0], "weight_decay": decay})  # add g0 with weight_decay
    optimizer.add_param_group({"params": g[1], "weight_decay": 0.0})  # add g1 (BatchNorm2d weights)
    LOGGER.info(
        f"{colorstr('optimizer:')} {type(optimizer).__name__}(lr={lr}) with parameter groups "
        f'{len(g[1])} weight(decay=0.0), {len(g[0])} weight(decay={decay}), {len(g[2])} bias'
    )
    return optimizer
复制

创建一个学习率调度器
可以根据训练过程中的训练轮数来调整学习率

# Scheduler
if opt.cos_lr:
	lf = one_cycle(1, hyp["lrf"], epochs)  # cosine 1->hyp['lrf']
else:
	lf = lambda x: (1 - x / epochs) * (1.0 - hyp["lrf"]) + hyp["lrf"]  # linear
scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lf)  # plot_lr_scheduler(optimizer, scheduler, epochs)复制

在深度学习中，经常会使用EMA（指数移动平均）这个方法对模型的参数做平均，以求提高测试指标并增加模型鲁棒。
EMA对第i步的梯度下降的步长增加了权重系数1- α^{n-1}，相当于做了一个earning rate decay.

ema = ModelEMA(model) if RANK in {-1, 0} else None复制

EarlyStopping
在初始化EarlyStopping类时，需要设置一个patience参数，表示在多少个epoch内如果没有改进，就停止训练。默认情况下，patience设置为30。

class EarlyStopping:
    # YOLOv3 simple early stopper
    def __init__(self, patience=30):
        """Initializes EarlyStopping to monitor training, halting if no improvement in 'patience' epochs, defaulting to
        30.
        """
        self.best_fitness = 0.0  # i.e. mAP
        self.best_epoch = 0
        self.patience = patience or float("inf")  # epochs to wait after fitness stops improving to stop
        self.possible_stop = False  # possible stop may occur next epoch

    def __call__(self, epoch, fitness):
        """Updates stopping criteria based on fitness; returns True to stop if no improvement in 'patience' epochs."""
        if fitness >= self.best_fitness:  # >= 0 to allow for early zero-fitness stage of training
            self.best_epoch = epoch
            self.best_fitness = fitness
        delta = epoch - self.best_epoch  # epochs without improvement
        self.possible_stop = delta >= (self.patience - 1)  # possible stop may occur next epoch
        stop = delta >= self.patience  # stop training if patience exceeded
        if stop:
            LOGGER.info(
                f"Stopping training early as no improvement observed in last {self.patience} epochs. "
                f"Best results observed at epoch {self.best_epoch}, best model saved as best.pt.\n"
                f"To update EarlyStopping(patience={self.patience}) pass a new patience value, "
                f"i.e. `python train.py --patience 300` or use `--patience 0` to disable EarlyStopping."
            )
        return stop复制

Warmup
如果当前的批次正好处于需要进行warmup的前几批数据时，它会遍历优化器中的所有参数组。参数组会把模型的网络结构中的所有参数分成3组，针对所有的bias参数，学习率会在热身时从0.1逐渐降低到初始学习率lr0,所有其他参数的学习率会从0.0逐渐升高到lr0

if ni <= nw:
	xi = [0, nw]  # x interp
	# compute_loss.gr = np.interp(ni, xi, [0.0, 1.0])  # iou loss ratio (obj_loss = 1.0 or iou)
	accumulate = max(1, np.interp(ni, xi, [1, nbs / batch_size]).round())
	for j, x in enumerate(optimizer.param_groups):
		# bias lr falls from 0.1 to lr0, all other lrs rise from 0.0 to lr0
		x["lr"] = np.interp(ni, xi, [hyp["warmup_bias_lr"] if j == 0 else 0.0, x["initial_lr"] * lf(epoch)])
		if "momentum" in x:
			x["momentum"] = np.interp(ni, xi, [hyp["warmup_momentum"], hyp["momentum"]])复制