YOLOv5训练遇到bug记录

monoecious

已于 2023-11-03 15:40:29 修改

阅读量350

点赞数 2

文章标签： YOLO bug

于 2023-11-01 11:18:31 首次发布

本文链接：https://blog.csdn.net/m0_54884642/article/details/134157338

版权

1、No labels found in E:\datasets\aerial_crack_labels\datasets_strange\score\train\labels\. See https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data

根据images文件夹的路径寻找labels的路径失败。

解决方式：

在datasets.py脚本里边定义函数：img2label_paths

#---------------------------------------------------------------定义函数-----------------------------------------

def img2label_paths(img_paths):

# Define label paths as a function of image paths

sa, sb = os.sep + 'images' + os.sep, os.sep + 'labels' + os.sep # /images/, /labels/ substrings

return [sb.join(x.rsplit(sa, 1)).rsplit('.', 1)[0] + '.txt' for x in img_paths]

#------------------------------------------------------------------------------------------------------------------------

注释掉原来的self.labels_files,重写self.labels_files:

重新运行train.py，既可正常训练。

2‘Do not know how to handle these types to promote: {‘DoubleTensor‘, ‘FloatTensor‘}

跑完一个epoch，总是报错！

该错误出现在YOLOv5早期的版本里边，两种解决方法：换用较新版本；更改源码；

错误定位在torch_utils.py脚本里，报错部分源码为：

    def update(self, model):
        self.updates += 1
        d = self.decay(self.updates)
        with torch.no_grad():
            if type(model) in (nn.parallel.DataParallel, nn.parallel.DistributedDataParallel):
                msd, esd = model.module.state_dict(), self.ema.module.state_dict()
            else:
                msd, esd = model.state_dict(), self.ema.state_dict()

            for k, v in esd.items():
                if v.dtype.is_floating_point:
                    v *= d
                    v += (1. - d) * msd[k].detach()

在for循环语句里边加入if判断语句，修改后的源码为：

    def update(self, model):
        self.updates += 1
        d = self.decay(self.updates)
        with torch.no_grad():
            if type(model) in (nn.parallel.DataParallel, nn.parallel.DistributedDataParallel):
                msd, esd = model.module.state_dict(), self.ema.module.state_dict()
            else:
                msd, esd = model.state_dict(), self.ema.state_dict()

            for k, v in esd.items():
                if v.dtype.is_floating_point:
                # ------------------------新增加代码--------------
                    if v.dtype != msd[k].dtype:
                        v = v.to(msd[k].dtype)
                # ---------------------新增代码--------------------
                        v *= d
                        v += (1. - d) * msd[k].detach()

代码即可运行！

但会发现，map等各项指标均不会更新，可能是加了if判断语句造成参数不更新所致，解决方案：

在改脚本中加入：

def is_parallel(model):
    # Returns True if model is of type DP or DDP
    return type(model) in (nn.parallel.DataParallel, nn.parallel.DistributedDataParallel)


def de_parallel(model):
    # De-parallelize a model: returns single-GPU model if model is of type DP or DDP
    return model.module if is_parallel(model) else model

并重写update函数：

    def update(self, model):
        # Update EMA parameters
        with torch.no_grad():
            self.updates += 1
            d = self.decay(self.updates)

            msd = de_parallel(model).state_dict()  # model state_dict
            for k, v in self.ema.state_dict().items():
                if v.dtype.is_floating_point:
                    v *= d
                    v += (1 - d) * msd[k].detach()

即可完美运行啦：（唉，这个bug阻碍了好多天）

3、TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.！

Traceback (most recent call last):
  File "train.py", line 776, in <module>
    train(hyp)
  File "train.py", line 438, in train
    results, maps, times = test.test(opt.data,
  File "E:\pruning-yolov5\mobile-yolov5-pruning-distillation-master\test.py", line 219, in test
    output_to_target(
  File "E:\pruning-yolov5\mobile-yolov5-pruning-distillation-master\utils\utils.py", line 979, in output_to_target
    return np.array(targets)
  File "D:\anaconda\conda\envs\pytorch1.7\lib\site-packages\torch\tensor.py", line 621, in __array__
    return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

按照提示：现将tensorrt数据转移到cpu上：

将 self.numpy() 改成 self.cpu().numpy()

即可！

monoecious

关注

2
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
YOLOv5训练遇到bug记录

---------------------------------------------------------------定义函数-----------------------------------------在datasets.py脚本里边定义函数：img2label_paths。根据images文件夹的路径寻找labels的路径失败。重新运行train.py，既可正常训练。
复制链接

扫一扫