【YOLO】yolov5训练中遇到的问题解决:BrokenPipeError: [Errno 32] Broken pipe

硬件配置

硬件设备:Windows + GeForce RTX 3070 + Cuda 11.1 +Anaconda
yolo版本:YOLOv5-3.1
源码地址:https://github.com/ultralytics/yolov5/releases/tag/v3.1
训练过程:https://blog.csdn.net/qq_44703886/article/details/112801975?spm=1001.2014.3001.5501添加链接描述

问题1:

Traceback(most recent call last):
File "train.py", line 460, in <modoule>
	train(hyp, opt,device,tb_writer)
File "train.py", line 460, in rtain
	model = Model(opt.cfg or ckpt['model'].yaml, ch = 3, nc = nc).to(device)
File "D:\yolov5-3.1\models\yolo.py", line 90, in __init__
	self.__initalize_biases()
File "D:\yolov5-3.1\models\yolo.py", line 149, in _initialize_biasea
	b[:,4]+=math.log(8/(640/s)** 2)
RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.

解决方法:找到问题所在的文件,及上述File "D:\yolov5-3.1\models\yolo.py", line 149
修改代码如下:

def _initialize_biases(self, cf=None):  # initialize biases into Detect(), cf is class frequency
        # cf = torch.bincount(torch.tensor(np.concatenate(dataset.labels, 0)[:, 0]).long(), minlength=nc) + 1.
        m = self.model[-1]  # Detect() module
        for mi, s in zip(m.m, m.stride):  # from
            b = mi.bias.view(m.na, -1)  # conv.bias(255) to (3,85)
            with torch.no_grad():
                b[:, 4] += math.log(8 / (640 / s) ** 2)  # obj (8 objects per 640 image)
                b[:, 5:] += math.log(0.6 / (m.nc - 0.99)) if cf is None else torch.log(cf / cf.sum())  # cls
            mi.bias = torch.nn.Parameter(b.view(-1), requires_grad=True)

问题2:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "D:\Anaconda\envs\DIP\lib\multiprocessing\spawn.py", line 105, in spawn_main
Traceback (most recent call last):
  File "D:/yolo/yolov5/train.py", line 522, in <module>
    exitcode = _main(fd)
  File "D:\Anaconda\envs\DIP\lib\multiprocessing\spawn.py", line 114, in _main
    prepare(preparation_data)
  File "D:\Anaconda\envs\DIP\lib\multiprocessing\spawn.py", line 225, in prepare
    train(hyp, opt, device, tb_writer, wandb)
  File "D:/yolo/yolov5/train.py", line 185, in train
    _fixup_main_from_path(data['init_main_from_path'])
  File "D:\Anaconda\envs\DIP\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
    image_weights=opt.image_weights, quad=opt.quad, prefix=colorstr('train: '))
  File "D:\yolo\yolov5\utils\datasets.py", line 83, in create_dataloader
    run_name="__mp_main__")
  File "D:\Anaconda\envs\DIP\lib\runpy.py", line 263, in run_path
    collate_fn=LoadImagesAndLabels.collate_fn4 if quad else LoadImagesAndLabels.collate_fn)
  File "D:\yolo\yolov5\utils\datasets.py", line 96, in __init__
    pkg_name=pkg_name, script_name=fname)
  File "D:\Anaconda\envs\DIP\lib\runpy.py", line 96, in _run_module_code
    self.iterator = super().__iter__()
  File "D:\Anaconda\envs\DIP\lib\site-packages\torch\utils\data\dataloader.py", line 352, in __iter__
    mod_name, mod_spec, pkg_name, script_name)
  File "D:\Anaconda\envs\DIP\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "D:\yolo\yolov5\train.py", line 11, in <module>
    return self._get_iterator()
  File "D:\Anaconda\envs\DIP\lib\site-packages\torch\utils\data\dataloader.py", line 294, in _get_iterator
    import torch.distributed as dist
  File "D:\Anaconda\envs\DIP\lib\site-packages\torch\__init__.py", line 117, in <module>
    return _MultiProcessingDataLoaderIter(self)
  File "D:\Anaconda\envs\DIP\lib\site-packages\torch\utils\data\dataloader.py", line 801, in __init__
    raise err
OSError: [WinError 1455] 页面文件太小,无法完成操作。 Error loading "D:\Anaconda\envs\DIP\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll" or one of its dependencies.
    w.start()
  File "D:\Anaconda\envs\DIP\lib\multiprocessing\process.py", line 112, in start
    self._popen = self._Popen(self)
  File "D:\Anaconda\envs\DIP\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "D:\Anaconda\envs\DIP\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "D:\Anaconda\envs\DIP\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
    reduction.dump(process_obj, to_child)
  File "D:\Anaconda\envs\DIP\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe
Process finished with exit code 1

解决方法:

  • 网上很多解决方法提到,将torch.utils.data.DataLoader()num_workers参数修改为0
    但是笔者在yolov5的源码文件中始终没有找到该函数,后面才知道,yolov5改写了InfiniteDataLoader类来继承torch.utils.data.dataloader.DataLoader(即torch.utils.data.DataLoader()函数)
    以上说的函数在yolov5项目的./utils/datasets.py代码文件中(68-78行左右),如下图
  • 令num_workers = 0,虽然可以成功训练,但笔者发现,终端显示的显卡占用率特别高,但是GPU利用率不到30%,因此笔者将参数适当提高到(0,2,4,8,16…),提高到8时,又出现了上述的BrokenPipeError: [Errno 32] Broken pipe的问题,于是笔者设定该值为4,既可以成功运行,又极大利用了GPU,训练速度提高了不少。(因此笔者建议可以根据自己的电脑配置设定合适的值)
  • 对于显存占用、GPU利用率以及如何提高yolov5训练速度,笔者推荐博客链接:https://blog.csdn.net/qq_32998593/article/details/92849585,写的非常细致!!!
    在这里插入图片描述
  • 43
    点赞
  • 135
    收藏
    觉得还不错? 一键收藏
  • 19
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 19
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值