调试SSD-pytorch代码问题汇总

最新推荐文章于 2024-04-28 22:58:47 发布

duanyajun987

最新推荐文章于 2024-04-28 22:58:47 发布

阅读量2.3k

点赞数 2

分类专栏：神经网络图像识别

神经网络同时被 2 个专栏收录

176 篇文章 5 订阅

订阅专栏

图像识别

100 篇文章 1 订阅

订阅专栏

代码链接：https://github.com/amdegroot/ssd.pytorch

1.执行demo-ssd.py，改动detection.py中49行：

if scores.numel() == 0:#scores.dim()

2. multibox_loss.py 中，97行

“loss_c[pos] = 0” 调试过程中发现 loss_c的shape与pos的shape 不同，会出现不匹配错误，因此将此句改为以下：

loss_c[pos.view(-1,1)] = 0

将pos通过view(-1,1) 改为与loss_c相匹配的shape。

3.multibox_loss.py中 N=num_pos.data.sum()的dtype为torch.int64,而进行除法操作的 loss_l 与loss_c的dtype为torch.float32,执行时会出现 ‘torch.cuda.LongTensor but found type torch.cuda.FloatTensor for argument’类似错误，此时需要查看参数类型，将N的类型改为torch.float32即可。

 
N = num_pos.data.sum()

N=N.float()

4.train.py代码中，在迭代过程中，每次执行batch张图片，通过images, targets = next(batch_iterator)读取图片时，如果next()中没有数据后会触发Stoplteration异常，使用下面语句替换 images, targets = next(batch_iterator)将解决这种异常问题。

 
while True:

try:

# 获得下一个值:

images, targets = next(batch_iterator)

except StopIteration:

# 遇到StopIteration就退出循环

break

5.RuntimeError: CUDNN_STATUS_INTERNAL_ERROR的解决办法：

需要清除CUDA缓存，使用sudo进行，但它属于Linux命令，windows中需要进行以下操作：

(1).在任意目录中新建文本文件，命名为sudo.js

(2).用记事本打开刚才新建的文件，粘贴下面代码

使用cmd打开sudo.js文件即可进行sudo操作。

(3).执行sudo rm -f ~/.nv/ （一定最后边不要漏掉“/”，否则会提示“.nv”是目录）

注意：当执行（3）中语句时，我的系统提示‘Windows 找不到文件 rm’,这时可以尝试在代码最处添加

torch.cuda.set_device(0)

6.test.py 与 eval.py中 nosetest运行时出现 ‘ _jb_nosetest_runner.py: error: unrecognized arguments: ’ 错误：

将

args = parser.parse_args() 替换为：

args, unknown = parser.parse_known_args()

由于pytorch版本从0.1-0.3升级到0.4(1.0)时变化较大，而且许多算法使用的是0.4以下版本，现在为了方便，都使用0.4版本的pytorch，但使用该源码训练模型时，出现一些因版本等问题出现的bug。
下面就以ssd.pytorch为例，修改其中出现的问题。同时记录0.2迁移到0.4时应该注意的内容。
1 路径问题

例如：
在这里插入图片描述根据自己的路径更改。
2 修改训练时，代码中的bug

补：RuntimeError: randperm is only implemented for CPU

在这里插入图片描述解决方法：
在这里插入图片描述在这里插入图片描述

1) RuntimeError: The shape of the mask [2, 8732] at index 0 does not match the shape of the indexed tensor [17464, 1] at index 0
在这里插入图片描述解决方法：
在这里插入图片描述2) RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.cuda.LongTensor for argument #3 'other’
在这里插入图片描述解决方法：
在这里插入图片描述
3）还用一些UserWarning，例如下面所示，根据提示的信息，修改。
UserWarning: volatile was removed and now has no effect. Use 'with torch.no_grad():" instead.
在这里插入图片描述解决方法：
在这里插入图片描述在这里插入图片描述

4）数据集的问题
由于自己的数据集中，对应的图片中，没有任何的物体，即对应的xml文件中没有groundtruth对应的xmin, ymin, xmax, ymax信息。
**解决方法：**整理数据集，去除没有上述信息的图片数据及对应的其他文件。

另外，训练时：File “/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py”, line 276, in next raise StopIteration
StopIteration
在这里插入图片描述
解决方法：
因为这个错误是，训练迭代数据一遍，而没有从新开始迭代。
在这里插入图片描述
在train.py中，原来的代码是165行，修改为两行“#”中间的内容。

5）其他问题
3 修改测试时，代码中的bug

1）路径或者测试数据文件统一

2） eval.py和 layers/functions/detection.py文件中，dim()
在这里插入图片描述
在这里插入图片描述
解决方法：
这两个文件中的407行和65行，原来的判断条件为 if *.dim() == 0:，因为是从0.2版本迁移到0.4版本，例如上面scores.dim()，当scores为空时，即tensor([])，scores.dim() == 1。故修改这里的判断条件，if torch.numel(scores) == 0: （获取scores即tensor中的元素个数）。

duanyajun987

关注

2
点赞
踩
7

收藏

觉得还不错? 一键收藏
3
评论
调试SSD-pytorch代码问题汇总

代码链接：https://github.com/amdegroot/ssd.pytorch1.执行demo-ssd.py，改动detection.py中49行：if scores.numel() == 0:#scores.dim()2. multibox_loss.py 中，97行“loss_c[pos] = 0” 调试过程中发现 loss_c的shape与pos的s...
复制链接

扫一扫