学习途中的bug

aoaozhu

已于 2023-09-14 10:58:28 修改

阅读量105

点赞数

文章标签：学习

于 2023-03-07 10:37:41 首次发布

本文链接：https://blog.csdn.net/qq_43381493/article/details/129377476

版权

各种bug

1.transforms.RandomResizedCrop(224)

`TypeError: ‘tuple‘ object is not callable`

解决办法：用于数据载入随机裁剪，当图片大小小于224时会报错

2.torchvision版本问题引起的bug

invalid index of a 0-dim tensor. Use `tensor.item()` in Python or `tensor.item<T>()` in C++ to conve

解决办法：修改loss.data[0]为loss.item()

3.由于系统不同引起的bug

[Errno 32] Broken pipe

解决办法：num_workers=0，因为在windows系统下不能进行多线程操作，这也是windows系统进行训练的一个弊端，比linux训练的慢。

4.网络训练调整torch.save以保证每隔一些迭代次数就及时保存.pth文件

5.在训练模型时出现Runtime error

Runtime Error(ARRAY_BOUNDS_EXCEEDED) // array bounds exceed     数组越界
Runtime Error(DIVIDE_BY_ZERO) //divisor is nil                                   除零
Runtime Error(ACCESS_VIOLATION) //illegal memory access                  非法内存读取
Runtime Error(STACK_OVERFLOW) //stack overflow                             系统栈过载

注意点：数组不要太大，否则内存超限，又或者torch.save保存太频繁导致超限，都会报错为Runtime error

6.训练权重文件路径不对，或者训练模型并不完整或损坏会出现以下情况

PytorchStreamReader failed reading zip archive: failed finding central directory

解决方法：检查路径确认是否无误，路径最后是否要加 / 。在保存时(torch.save)注意一定要训练完整周期之后再进行保存，否则会出现以上问题

7.矩阵相乘出现的错误

h = torch.mm(input_h, self.W)
RuntimeError: self must be a matrix

原因：torch.mm()是两个矩阵相乘，即两个二维的张量相乘，维度超过二维，则会报错。
这两个tensor的维度是[16, 16, 29]和[29, 70]
应改为torch.matmul()

8.torch.where()粗线的错误

attention = torch.where(adj > 0, e, zero_vec)
RuntimeError: The size of tensor a (8) must match the size of tensor b (512) at non-singleton dimension 2

这里面condition, a, b必须是形状都一样的矩阵，否则会出现上述错误

9.pytest的错误

E       fixture 'args' not found

定义的函数中有test 【‘def test():’】改名成tes即可【‘def tes():’】

10.images.grad 是none的情况
部分代码如下

	images = Variable(images.to(torch.device('cuda')), requires_grad=True)
    loss = criterion(outputs, targets)  # 计算loss函数
    loss.requires_grad_(True)
    loss.backward()  # 进行一次反向传播
    # Generate perturbation
	grad_j = torch.sign(images.grad.data) # 输出梯度经过sgn(x)函数的张量.data

此时会在最后一句报错

	grad_j = torch.sign(images.grad.data) # 输出梯度经过sgn(x)函数的张量.data
	AttributeError: 'NoneType' object has no attribute 'data'

原因即解决办法
https://www.jianshu.com/p/29ed5b202db1

11.优化过程中，优化器不报错，优化过程不报错，loss始终不变化，检查计算过程中的叶子节点是否都为True

aoaozhu

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
学习途中的bug

bug
复制链接

扫一扫