detectron2的amp和cudnn记录

最新推荐文章于 2023-02-17 16:35:56 发布

电饭锅22

最新推荐文章于 2023-02-17 16:35:56 发布

阅读量900

点赞数 4

分类专栏： detectron2 文章标签： python 深度学习 pytorch

本文链接：https://blog.csdn.net/wenghd22/article/details/111865334

版权

detectron2 专栏收录该内容

5 篇文章 3 订阅

订阅专栏

文章目录

一、amp混合精度
二、CUDNN_BENCHMARK

一、amp混合精度

简单来说就是部分float32换成float16，加快训练推理速度，减少显存的使用。
pytorch1.6后自带amp功能。detectron2 0.3版本后新增amp训练功能。

v0.3
Features & Improvements: Support mixed precision in training (using cfg.SOLVER.AMP.ENABLED) and inference.

amp使用的是显卡里的tensor core，如果显卡没有，那就没什么效果，20系列之后显卡应该就有，具体有没有可以去查

在detectron2上并没有节省很多的显存，训练和推理时间减少较大，根据显卡和模型，输入等都有关系。

推理需要自己修改部分代码，在engine/defaults.py的class DefaultTrainer(TrainerBase):

def test(cls, cfg, model, evaluators=None):
	#进入下面这函数里修改
	results_i = inference_on_dataset(model, data_loader, evaluator,cfg)

增加如下几句代码，在evaluator.py文件的

def inference_on_dataset(model, data_loader, evaluator,cfg):
   #可以传入cfg后选择在推理时，开启或关闭amp功能
    from torch.cuda.amp import autocast
        if cfg.SOLVER.AMP.ENABLED:
            with autocast():
                outputs = model(inputs)
        else:
            outputs = model(inputs)

二、CUDNN_BENCHMARK

在输入图片的（N，C，H，W）输入不变的情况下，cudnn将自动寻找卷积最快的算法实现，加快训练和推理的速度。

# Benchmark different cudnn algorithms.
# If input images have very different sizes, this option will have large overhead
# for about 10k iterations. It usually hurts total time, but can benefit for certain models.
# If input images have the same or similar sizes, benchmark is often helpful.
_C.CUDNN_BENCHMARK = False

大概快了10%左右的速度在训练和推理上，不同显卡和参数应该会对速度有一些影响。

电饭锅22

关注

4
点赞
踩
3

收藏

觉得还不错? 一键收藏
3
评论
detectron2的amp和cudnn记录

文章目录一、amp混合精度二、CUDNN_BENCHMARK一、amp混合精度简单来说就是部分float32换成float16，加快训练推理速度，减少显存的使用。pytorch1.6后自带amp功能。detectron2 0.3版本后新增amp训练功能。v0.3Features & Improvements: Support mixed precision in training (using cfg.SOLVER.AMP.ENABLED) and inference.amp使用的
复制链接

扫一扫