Pytorch常见错误

最新推荐文章于 2024-07-22 08:45:00 发布

答案是你

最新推荐文章于 2024-07-22 08:45:00 发布

阅读量1.7k

点赞数

文章标签： python 人工智能深度学习网络

本文链接：https://blog.csdn.net/qq_44419614/article/details/109997501

版权

1.报错：ValueError: num_samples should be a positive interger value, but got num_samples=0可能的原因：传入的 Dataset 中的 len (self.data_info)==0, 即传入该 DataLoader 的 dataset 里没有数据。解决方法：
检查 dataset 中的路径
检查 Dataset 的__len__() 函数为何输出 0

2.报错：TypeError: pic should be PIL Image or ndarry. Got <class ‘torch.Tensor’> 可能原因：当前操作需要 PIL Image 或 ndarry 数据类型，但传入了 Tensor 解决方法：
检查 transform 中是否存在两次 ToTensor () 方法
检查 transform 中每一个操作的数据类型变化

3.报错：RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 93 and 89 in dimension 1 at /Users/soumith/code/builder/wheel/pytorch-src/aten/src/TH/generic/THTensorMath.cpp:3616可能的原因：dataloader 的__getitem__函数中，返回的图片形状不一致，导致无法 stack 解决方法：检查__getitem__函数中的操作

4.报错：conv: RuntimeError: Given groups=1, weight of size 6 1 5 5, expected input [16, 3, 32, 32] to have 1 channels, but got 3 channels instead linear: RuntimeError: size mismatch, m1: [16 x 576], m2: [400 x 120] at …/aten/src/TH/generic/THTensorMath.cpp:752可能的原因：网络层输入数据与网络的参数不匹配解决方法：
检查对应网络层前后定义是否有误
检查输入数据 shape

5.报错：AttributeError: ‘DataParallel’ object has no attribute 'linear’可能的原因：并行运算时，模型被 dataparallel 包装，所有 module 都增加一个属性 module. 因此需要通过 net.module.linear 调用解决方法：
网络层前加入 module.

6.报错: python RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available () is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device (‘cpu’) to map your storages to the CPU.
可能的原因：gpu 训练的模型保存后，在无 gpu 设备上无法直接加载解决方法：
需要设置 map_location=“cpu”

7.报错：AttributeError: Can’t get attribute ‘FooNet2’ on <module ‘main’ from '可能的原因：保存的网络模型在当前 python 脚本中没有定义解决方法：
这个就是如果我们保存了整个网络模型需要重新加载进来的时候要注意的地方。需要先定义网络的类。
提前定义该类

8.报错：RuntimeError: Assertion cur_target >= 0 && cur_target < n_classes’ failed. at …/aten/src/THNN/generic/ClassNLLCriterion.c:94可能的原因：标签数大于等于类别数量，即不满足 cur_target < n_classes，通常是因为标签从 1 开始而不是从 0 开始解决方法：修改 label，从 0 开始，例如：10 分类的标签取值应该是 0-9 交叉熵损失函数中会见到的。

9.报错：python RuntimeError: expected device cuda:0 and dtype Long but got device cpu and dtype Long Expected object of backend CPU but got backend CUDA for argument #2 ‘weight’
可能的原因：需计算的两个数据不在同一个设备上解决方法：采用 to 函数将数据迁移到同一个设备上

10.报错：RuntimeError: DataLoader worker (pid 27) is killed by signal: Killed. Details are lost due to multiprocessing. Rerunning with num_workers=0 may give better error trace.可能原因：内存不够（不是 gpu 显存，是内存）解决方法：申请更大内存

11.报错：RuntimeError: reduce failed to synchronize: device-side assert triggered可能的原因：采用 BCE 损失函数的时候，input 必须是 0-1 之间，由于模型最后没有加 sigmoid 激活函数，导致的。解决方法：让模型输出的值域在 [0, 1]

12.报错：RuntimeError: unexpected EOF. The file might be corrupted.torch.load 加载模型过程报错，因为模型传输过程中有问题，重新传一遍模型即可

13.报错：UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xff in position 1: invalid start byte可能的原因：python2 保存，python3 加载，会报错解决方法：把 encoding 改为 encoding=‘iso-8859-1’ check_p = torch.load (path, map_location=“cpu”, encoding=‘iso-8859-1’)

14.报错：RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same问题原因：数据张量已经转换到 GPU 上，但模型参数还在 cpu 上，造成计算不匹配问题。解决方法：通过添加 model.cuda () 将模型转移到 GPU 上以解决这个问题。或者通过添加 model.to (cuda) 解决问题

答案是你

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Pytorch常见错误

1.报错：ValueError: num_samples should be a positive interger value, but got num_samples=0可能的原因：传入的 Dataset 中的 len (self.data_info)==0, 即传入该 DataLoader 的 dataset 里没有数据。解决方法：检查 dataset 中的路径检查 Dataset 的__len__() 函数为何输出 02.报错：TypeError: pic should be PIL Imag
复制链接

扫一扫