ScaledYOLOv4训练自己的数据集（各种报错均有解决）

自律的小陈

已于 2024-05-13 17:22:14 修改

阅读量813

点赞数 22

文章标签： YOLO

于 2024-05-13 15:48:27 首次发布

本文链接：https://blog.csdn.net/weixin_44688182/article/details/138804135

版权

一。准备数据集，安装yolov5格式即可。

二。环境搭建

2.1 。20显卡系列（cuda10.1）
首先创建虚拟环境（不在过多赘述）

// An highlighted block
conda create -n scaledyolov4 python=3.8

然后进入虚拟环境

// An highlighted block
conda activate scaledyolov4

进入虚拟环境后安装依賴
ScaledYOLOv4并没有提拱requirements.txt,但是他和yolor环境一样，https://github.com/WongKinYiu/yolor/blob/main/requirements.txt（yolor）
如果你的显卡是2080ti,你直接可以按照这requirements.txt安装即可，其中他会让你先安装cython 和numpy 直接pip安装就行pip install Cython numpy 在这里插入图片描述
安装完Cython和numpy后就可以安装txt文件依賴了

// An highlighted block
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -r requirements.txt

因为ScaledYOLOv4需要mish,他是由c++编写的，所以需要编译。下面进行mish-cuda编译
首先下载mish-cuda github地址：https://github.com/JunnYu/mish-cuda
下载后放入到ScaledYOLOv4根目录下，然后cd mish-cuda-master
进入目录后运行：python setup.py build install
编译成功
在这里插入图片描述然后开始训练即可：

// An highlighted block
python train.py --batch-size 8 --img 896 896 --data /home/mydata/mydata.yaml --cfg models/yolov4-p5.yaml --weights 'yolov4-p5.pt' --name yolov4-p5

如果你在30显卡上安装了上面requirements.txt，其实也是可以编译mish-cuda的。具体如下：export TORCH_CUDA_ARCH_LIST=“7.5” 去降低版本，因为pytorch不能在compute_86(3090)中工作，但它可以在compute_80中工作

// An highlighted block
export TORCH_CUDA_ARCH_LIST="7.5"

然后继续重新编译即可
2.2 关于30显卡如何训练，
在自己的虚拟环境中安装pytorch以及cuda

// An highlighted block
pip install torch==1.8.2+cu111 torchvision==0.9.2+cu111 torchaudio==0.8.2 -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html

剩下的和2.1提到的requirements.txt一样，记得把txt文件中torch1.7.0
torchvision0.8.1 去掉
训练和上面的一样，但是使用pytorch1.8 cuda 11.1(3090)在训练时会报错
在这里插入图片描述这个只需我们改一下models/yolo.py的141行，添加with torch.no_grad():即可
修改前：
修改前代码

// 修改前
  for mi, s in zip(m.m, m.stride):  # from
            b = mi.bias.view(m.na, -1)  # conv.bias(255) to (3,85)
            b[:, 4] += math.log(8 / (640 / s) ** 2)  # obj (8 objects per 640 image)
            b[:, 5:] += math.log(0.6 / (m.nc - 0.99)) if cf is None else torch.log(cf / cf.sum())  # cls
            mi.bias = torch.nn.Parameter(b.view(-1), requires_grad=True)

修改后：

// 修改后
 for mi, s in zip(m.m, m.stride):  # from
            b = mi.bias.view(m.na, -1)  # conv.bias(255) to (3,85)
            with torch.no_grad():
                b[:, 4] += math.log(8 / (640 / s) ** 2)  # obj (8 objects per 640 image)
                b[:, 5:] += math.log(0.6 / (m.nc - 0.99)) if cf is None else torch.log(cf / cf.sum())  # cls
            mi.bias = torch.nn.Parameter(b.view(-1), requires_grad=True)

修改完成后即可训练。
在这里插入图片描述

下面是训练时遇到的一些问题

1 AttributeError: module ‘numpy’ has no attribute ‘int’.
np.int was a deprecated alias for the builtin int. To avoid this error in existing code, use int by itself. Doing this will not modify any behavior and is safe. When replacing np.int, you may wish to use e.g. np.int64 or np.int32 to specify the precision. If you wish to review your current use, check the release note link for additional information.
在这里插入图片描述直接根据报错，去utils找到datasets.py文件，找到316行把np.int改为int即可

2.Error in training _pickle.UnpicklingError: STACK_GLOBAL requires str****
解决：这是因为数据集缓存，去找到数据集images和labels 删掉缓存文件就好

3. cum_counts = np.cumsum(np.greater(counts, 0, dtype=np.int32))
TypeError: No loop matching the specified signature and casting was found for ufunc greater
根据提示找到报错文件的路径conda/envs/yolov4/lib/python3.8/site-packages/torch/utils/tensorboard/summary.py"找到summary.py文件第324行把 cum_counts = np.cumsum(np.greater(counts, 0, dtype=np.int32))这一行的 dtype=np.int32去掉在这里插入图片描述 4.AttributeError: module ‘PIL.Image’ has no attribute ‘ANTIALIAS’
解决：降级Pillow的版本，比如使用9.5.0版本，先卸载，再重新安装

// 修改后
 
pip uninstall -y Pillowpip install Pillow==9.5.0 
#这个好像只能卸载，不会安装，所以执行完这个命令后再执行下面这个pip即可

pip install Pillow==9.5.0

5 TypeError: can’t convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. 在这里插入图片描述解决：找到utils/general.py" 大约1103行，把改行所在的函数改为下面代码块

# Convert model output to target format [batch_id, class_id, x, y, w, h, conf]  
if isinstance(output, torch.Tensor):  
    output = output.cpu().numpy()  
targets = []  
for i, o in enumerate(output)  
    if o is not None:  
        for pred in o:  
            box = pred[:4]  
            w = (box[2] - box[0]) / width  
            h = (box[3] - box[1]) / height  
            x = box[0] / width + w / 2  
            y = box[1] / height + h / 2  
            conf = pred[4]  
            cls = int(pred[5])  
            targets.append([i, cls, float(x.cpu()),   
                                    float(y.cpu()),   
                                    float(w.cpu()),   
                                    float(h.cpu()),   
                                    float(conf.cpu())])  
return np.array(targets)

6.No module named ‘mc.build.lib’
在这里插入图片描述
这是因为老版权重问题，用新的权重即可，但是ScaledYOLOv4官方现在提供的权重文件，在github上找不到了，下面是我找到的别人提供的
https://www.flyai.com/m/yolov4-p5.pt
大家遇到问题随时咨询，互相进步