YOLO改进模块出现的问题及改进方法

TracyGC

已于 2024-03-21 21:07:13 修改

阅读量1.2k

点赞数 16

分类专栏：研究与思考文章标签： YOLO

于 2024-03-20 10:36:36 首次发布

本文链接：https://blog.csdn.net/tracygc/article/details/136868488

版权

研究与思考专栏收录该内容

37 篇文章 2 订阅

订阅专栏

1.grid_sampler_2d_backward_cuda

在对YOLOv9进行改进的过程中，有的时候就会出现这种报错：RuntimeError: grid_sampler_2d_backward_cuda does not have a deterministic implementation，but you set 'torch.use_deterministic_algorithms(True)'. You can turn off determinism just for this operation, or you can use the 'warn_only=True' option,

解决办法：

问题出在反向传播上面。在train_dual.py(train文件)直接搜索scaler.scale(loss).backward()，在其前面关闭这个决定性算法。

            torch.use_deterministic_algorithms(False)
            # Backward
            scaler.scale(loss).backward()

成功运行！

2. DataLoader worker

在对YOLOv9进行改进的过程中，有的时候就会出现这种报错：RuntimeError: DataLoader worker (pid(s) 10556, 2552, 32032, 34540, 34092, 24356) exited unexpectedly

原因：cuda 虚拟环境的共享内存不足

解决办法：

我是因为走之前开的Pycharm运行代码太多，又没有关机。直接在任务管理器把之前跑过的python服务台都关掉了。

有的同学可以直接

要么改成更小的batchsize，
将numworkers = 1注释掉,不用多进程

3. CUDA out of memory

报错RuntimeError: CUDA out of memory. Tried to allocate 50.00 MiB (GPU 0; 8.00 GiB total capacity; 5.91 GiB already allocated; 0 bytes free; 6.06 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

解决办法：同2

4.Input type (torch.cuda.HalfTensor)

报错RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.cuda.FloatTensor) should be the same

解决办法：

这个是需要关掉混合精度amp，简单的方法，训练文件找到“ amp = check_amp(model)”下面加上“amp =False”就可以

还有打印不出来Gflops的和断点重训，见我之前的文章

修改yolov9的模型打印不出来Gflops的解决办法-CSDN博客

YOLOv9训练不中断，从断点处训练的方法_yolov9早停-CSDN博客

TracyGC

关注

16
点赞
踩
14

收藏

觉得还不错? 一键收藏
打赏
2
评论
YOLO改进模块出现的问题及改进方法

RuntimeError: grid_sampler_2d_backward_cuda does not have a deterministic implementationRuntimeError: DataLoader worker (pid(s) 10556, 2552, 32032, 34540, 34092, 24356) exited unexpectedlyCUDA out of memory
复制链接

扫一扫