训练yolov5出现的错误

最新推荐文章于 2025-06-08 18:26:57 发布

zhangxiangweide

最新推荐文章于 2025-06-08 18:26:57 发布

阅读量2.2k

点赞数 2

CC 4.0 BY-SA版权

分类专栏： pytorch 文章标签： pytorch

本文链接：https://blog.csdn.net/zhangxiangweide/article/details/125781044

pytorch 专栏收录该内容

11 篇文章

订阅专栏

在使用PyTorch时遇到了Leaking Caffe2 thread-pool警告，这与DataLoader的pin_memory设置有关。当主机内存不足时，将pin_memory设为False可以避免内存泄漏，但会降低GPU数据传输速度。另外，修复了因在有梯度的变量上进行就地操作引发的RuntimeError，以及因尝试将CUDA tensor转换为numpy而引发的TypeError。同时，解决了一个关于找不到库文件的ImportError问题。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

pyorch问题(1):锁页内存问题：Leaking Caffe2 thread-pool after fork. (function pthreadpool

[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)

pytorch运行过程中遇到Leaking Caffe2 thread-pool after fork. (function pthreadpool)
这是因为DataLoader中的pin_memory设置为True；
主机中的内存，有两种存在方式，一是锁页，二是不锁页，锁页内存存放的内容在任何情况下都不会与主机的虚拟内存进行交换（注：虚拟内存就是硬盘），而不锁页内存在主机内存不足时，数据会存放在虚拟内存中。
pin_memory就是锁页内存，创建DataLoader时，设置pin_memory=True，则意味着生成的Tensor数据最开始是属于内存中的锁页内存，这样将内存的Tensor转义到GPU的显存就会更快一些；在设备比较告诉高端，内存充足的情况下，可以将pin_memory设置为True，因为这样设置的话，则意味着生成的Tensor数据最开始是属于内存中的锁页内存==(显存都是虚拟内存)==，这样将内存的Tensor转义到GPU的显存就会更快一些。但是如果主机内存不足的话，设置pin_memory为false，回到导致这种错误；
解决办法：将pin_memory设置为false；这样在锁存不足的时候，就会把数据存在虚拟内存(硬盘内)；只不过这种方法，在给GPU喂数据的时候会比较慢；
解决方法：
DataLoader函数中参数改成pin_memory = False

修改前：
DataLoader(val_dataset, shuffle=True, batch_size=Batch_size, num_workers=4, pin_memory=True,
                                    drop_last=True, collate_fn=yolo_dataset_collate)

修改后：
DataLoader(val_dataset, shuffle=True, batch_size=Batch_size, num_workers=4, pin_memory=False,
                                    drop_last=True, collate_fn=yolo_dataset_collate)

TypeError：can‘t convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy

提示：将报错代码self.numpy()改为self.cpu().numpy()即可

AAE_x_hat = AAE_x_hat.detach().cpu().numpy().squeeze()

RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.

修改前：
        for mi, s in zip(m.m, m.stride):  # from
            b = mi.bias.view(m.na, -1)  # conv.bias(255) to (3,85)
            b[:, 4] += math.log(8 / (640 / s) ** 2)  # obj (8 objects per 640 image)
            b[:, 5:] += math.log(0.6 / (m.nc - 0.99)) if cf is None else torch.log(cf / cf.sum())  # cls
            mi.bias = torch.nn.Parameter(b.view(-1), requires_grad=True)

修改后： 
 def _initialize_biases(self, cf=None):  # initialize biases into Detect(), cf is class frequency
      # https://arxiv.org/abs/1708.02002 section 3.3
      # cf = torch.bincount(torch.tensor(np.concatenate(dataset.labels, 0)[:, 0]).long(), minlength=nc) + 1.
      m = self.model[-1]  # Detect() module
      for mi, s in zip(m.m, m.stride):  # from
          b = mi.bias.view(m.na, -1)  # conv.bias(255) to (3,85)
          with torch.no_grad():
              b[:, 4] += math.log(8 / (640 / s) ** 2)  # obj (8 objects per 640 image)
              b[:, 5:] += math.log(0.6 / (m.nc - 0.99)) if cf is None else torch.log(cf / cf.sum())  # cls
          mi.bias = torch.nn.Parameter(b.view(-1), requires_grad=True)

AttributeError: module ‘distutils‘ has no attribute ‘version‘ 解决方案

pip uninstall setuptools
pip install setuptools==59.5.0

ImportError: libGL.so.1: cannot open shared object file: No such file or directory

apt install libgl1-mesa-glx

apt-get update

ImportError: libgthread-2.0.so.0: cannot open shared object file: No such file or directory

apt-get update
apt-get install libglib2.0-dev