复现代码FSCE遇到的问题记录

最新推荐文章于 2024-05-24 23:45:26 发布

每周三更新

最新推荐文章于 2024-05-24 23:45:26 发布

阅读量855

点赞数 2

分类专栏：小样本学习文章标签：深度学习计算机视觉人工智能

本文链接：https://blog.csdn.net/weixin_45897923/article/details/129132653

版权

小样本学习专栏收录该内容

2 篇文章 0 订阅

订阅专栏

前提

paper： https://arxiv.org/abs/2103.05950
code：https://github.com/megvii-research/FSCE
参考复现博客：https://blog.csdn.net/qiankendeNMY/article/details/128450196

一. 基础环境

Windows10
python3.8
CUDA11.7
torch1.13
torchvision0.14

二. 环境问题

1. 首先是pycocotools和fvcore的install

在Windows中下载（没有双引号）
pycocotools: pip install git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI
fvcore: pip install git+https://github.com/facebookresearch/fvcore
解决方法：在网站下载对应包，具体参考pycocotools下载

1.1 pycocotools(如果不用coco数据集可以先不解决这个问题)

各种换源还是报各种错误。在这里插入图片描述

参考：
pycocotools下载

成功：
在这里插入图片描述

1.2 fvcore

参考：https://www.bilibili.com/read/cv19290223/ 在这里插入图片描述
成功：

2. ModuleNotFoundError: No module named ‘torch’

明明环境中有torch，但是却显示没有torch模块

在终端运行
python setup.py build develop # you might need sudo
没有进入环境，尽管接口的虚拟环境中有torch，但是？？不懂……

解决方法：
cmd进入setup.pys的上级目录（代码文件夹）激活环境，conda activate FSCE，然后运行python setup.py build develop 在这里插入图片描述

3. python setup.py build develop中出现的问题

3.1configs目录名无效

Linux解决方案参考：https://blog.csdn.net/qiankendeNMY/article/details/128450196
Windows解决方案参考：https://github.com/megvii-research/FSCE/issues/60
在model_zoo下新建config/configs，然后在congfigs中换成自己的路径（写到上一级）（我写的是上面报错的那个目录）
在这里插入图片描述

3.2 ROIAlignRotated_cud.cu

error: command ‘C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin\nvcc.exe’ failed with exit code 1
在这里插入图片描述
解决方法参考：https://github.com/facebookresearch/maskrcnn-benchmark/issues/254
https://blog.csdn.net/tanmx219/article/details/87827035

找到出错的对应目录，打开文件（记事本），共修改3个地方。
eg：

在前面加一个ceil_div函数
后面修改dim3 grid 语句（共2个地方）
在这里插入图片描述

3.3 ROIAlign_cud.cu

在这里插入图片描述
与上述3.2的修改思路一致

3.4 error: identifier “AT_CHECK“ is undefined

在这里插入图片描述
将报错行中的AT_CHECK替换为TORCH_CHECK即可（我全部替换了）
参考博客：https://blog.csdn.net/weixin_44444492/article/details/118887280

3.5 error C3861: “AT_CHECK”: 找不到标识符

在这里插入图片描述
在deform_conv.h中的AT_CHECK替换为TORCH_CHECK

Finished！

在这里插入图片描述

三. 运行过程中出现的问题

1. 运行过程中会缺模块，缺什么模块就下什么模块。有的模块已经下载好了，可能是版本不匹配，重新下载就好

2. AttributeError: module ‘os’ has no attribute ‘getuid’

在这里插入图片描述

3. AttributeError: ‘Namespace’ object has no attribute ‘dist_url’

在这里插入图片描述
注释相关行

4. 显存不足

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 850.00 MiB (GPU 0; 15.99 GiB total capacity; 14.47 GiB already allocated; 158.00 MiB free; 14.50 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

然后一直不知道在哪改这个batch_size，最后在配置文件的Base-RCNN-FPN里面找到了IMS_PER_BATCH改为4（因为只有1个GPU）