pytorch程序开发bug记录2

最新推荐文章于 2023-08-09 16:09:16 发布

TODO_D2D

最新推荐文章于 2023-08-09 16:09:16 发布

阅读量776

点赞数

分类专栏：程序开发文章标签： python linux ubuntu

本文链接：https://blog.csdn.net/weixin_44605991/article/details/123257897

版权

程序开发专栏收录该内容

2 篇文章 0 订阅

订阅专栏

搭建mmdetection目标检测框架时，所遇到的相关问题进行记录：

错误一：

ValueError:num_samples should be a positive integer value, but got num_samp=0

解决方案：
将data_loader中“shuffle=TRUE”改为FALSE。
原因：
shuffle参数设置错误，已有batch_sample，不需要shuffle来进行随机的sample。

错误二：

AssertionError: Default process group is not initialized

解决方案：
在“tools/train.py”中加入：

import torch.distributed as dict
dist.init_process_group('gloo',init_method='file:///temp/somefile',rank=0,world_size=1)

原因：
非分布式训练使用了分布式训练的设置，如以上方法未解决，可能存在的问题在‘configs/base/models’文件中

norm_cfg=dict(type='SyncBN',requires_grad=True)，

将“SyncBN”更改为“BN”。

搭建mmdetection目标检测框架中的mmrotate框架时，所遇到的相关问题进行记录：

错误三：

ImportError:libcudart.so.10.2:cannot open shared object file: No such file or directory.

解决方案：
重装cuda,将版本更新为10.2（由于我报错中为10.2）

conda install pytorch torchvision cudatoolkit=10.2 -c pytorch

注意，mmcv,mmdet等关联依赖可能也需要重新安装，可根据报错进一步修改。
原因：
1、未使用默认的cudatoolkit版本，因此缺少10.2的库
2、缺少动态安全库，可手动搭建软连接（亲测无效，可能我存在的并不是这个问题，但给出如下解决方案，供大家尝试）

zyt@zyt-Z97-HD3:~$ cd /usr/local
zyt@zyt-Z97-HD3:/usr/local$ ls     ###查看你所使用的cuda版本，根据自己的版本执行以下命令

sudo ldconfig /usr/local/cuda-10.1/lib64
sudo ldconfig

错误四：

ImportError:~/mmcv/_ext.cpython-37m-x86_64-linux-gnu.so:undefined symbol:_ZN2at5sliceERKNS_6TensorElN3c108optionalIlEES5_l

解决方案：

pip install mmcv-full https://download.openmmlab.com/mmcv/dist/cu102/torch1.6.0/index.html  ###cuda版本及torch版本根据自己需求更改

原因：
重装cuda后导致mmcv版本与cuda或torch版本不兼容，需重新安装。

TODO_D2D

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
pytorch程序开发bug记录2

**搭建mmdetection目标检测框架时，所遇到的相关问题进行记录：**错误一：ValueError:num_samples should be a positive integer value, but got num_samp=0解决方案：将data_loader中“shuffle=TRUE”改为FALSE。原因：shuffle参数设置错误，已有batch_sample，不需要shuffle来进行随机的sample。错误二：AssertionError: Default pro
复制链接

扫一扫

专栏目录