yolov5模型训练早停模型变大

最新推荐文章于 2024-07-19 23:38:36 发布

wyw0000

最新推荐文章于 2024-07-19 23:38:36 发布

阅读量351

点赞数 4

分类专栏： yolo 文章标签： YOLO 深度学习计算机视觉

本文链接：https://blog.csdn.net/wyw0000/article/details/139789023

版权

yolo 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

1. 背景

最近使用tph-yolov5训练yolov5l-tph-plus模型时，发现模型收敛的差不多了，就果断的停止了训练，结果发现last.pt和best.pt竟然488M，而正常训练完成的模型仅有82M.。

2. 原因分析

2.1 train代码分析

查看代码发现train.py中，训练结束后有这么一段代码：

# end training -----------------------------------------------------------------------------------------------------
    if RANK in [-1, 0]:
        LOGGER.info(f'\n{epoch - start_epoch + 1} epochs completed in {(time.time() - t0) / 3600:.3f} hours.')
        for f in last, best:
            if f.exists():
                strip_optimizer(f)  # strip optimizers
                if f is best:
                    LOGGER.info(f'\nValidating {f}...')
                    results, _, _ = val.run(data_dict,
                                            batch_size=batch_size // WORLD_SIZE * 2,
                                            imgsz=imgsz,
                                            model=attempt_load(f, device).half(),
                                            iou_thres=0.65 if is_coco else 0.60,  # best pycocotools results at 0.65
                                            single_cls=single_cls,
                                            dataloader=val_loader,
                                            save_dir=save_dir,
                                            save_json=is_coco,
                                            verbose=True,
                                            plots=True,
                                            callbacks=callbacks,
                                            compute_loss=compute_loss)  # val best model with plots
                    if is_coco:
                        callbacks.run('on_fit_epoch_end', list(mloss) + list(results) + lr, epoch, best_fitness, fi)

        callbacks.run('on_train_end', last, best, plots, epoch, results)
        LOGGER.info(f"Results saved to {colorstr('bold', save_dir)}")

而其中处理文件压缩的是strip_optimizer(f)

2.2 strip_optimizer函数分析

该函数位于utils/general.py中

def strip_optimizer(f='best.pt', s=''):  # from utils.general import *; strip_optimizer()
    # Strip optimizer from 'f' to finalize training, optionally save as 's'
    x = torch.load(f, map_location=torch.device('cpu'))
    if x.get('ema'):
        x['model'] = x['ema']  # replace model with ema
    for k in 'optimizer', 'training_results', 'wandb_id', 'ema', 'updates':  # keys
        x[k] = None
    x['epoch'] = -1
    x['model'].half()  # to FP16
    for p in x['model'].parameters():
        p.requires_grad = False
    torch.save(x, s or f)
    mb = os.path.getsize(s or f) / 1E6  # filesize
    print(f"Optimizer stripped from {f},{(' saved as %s,' % s) if s else ''} {mb:.1f}MB")

该函数用于从给定的模型文件（‘f’）中剥离优化器，并可选择性地将剥离后的模型保存为新文件（‘s’）。具体操作包括：

将模型文件加载到CPU上；
如果存在’ema’，则用’ema’替换’model’；
将’optimizer’、‘training_results’、‘wandb_id’、'ema’和’updates’这几个键的值设为None；
将’epoch’设为-1；
将模型转换为FP16；
将模型的所有参数设置为不需要梯度；
将处理后的模型保存到文件’s’中，如果’s’为空则保存到文件’f’中；
计算文件大小并输出剥离优化器后的文件名和大小。

早停没有经过该函数，因此模型精度是FP32，没有去除优化器等信息，因此模型比较大。

3. 验证

写代码调用strip_optimizer对488的模型进行处理，代码如下：

from pathlib import Path
import sys
import os

FILE = Path(__file__).resolve()
ROOT = FILE.parents[0]  # YOLOv5 root directory
if str(ROOT) not in sys.path:
    sys.path.append(str(ROOT))  # add ROOT to PATH
ROOT = Path(os.path.relpath(ROOT, Path.cwd()))  # relative

from utils.general import LOGGER, check_dataset, check_file, check_git_status, check_img_size, check_requirements, \
    check_suffix, check_yaml, colorstr, get_latest_run, increment_path, init_seeds, labels_to_class_weights, \
    labels_to_image_weights, methods, one_cycle, print_args, print_mutation, strip_optimizer


if __name__ == '__main__':
    save_dir = r'E:/code/other/tph-yolov5-main/runs/train/v5l-tph-plus3/'
    w = save_dir + 'weights/'  # weights dir
    last, best = w + 'last.pt', w + 'best.pt'
    for f in last, best:
        #if f.exists():
        strip_optimizer(f)