Pytorch中报过的错及解决方法（持续更新中）

最新推荐文章于 2024-06-02 11:00:26 发布

tju_tonge

最新推荐文章于 2024-06-02 11:00:26 发布

阅读量4.1k

点赞数 1

分类专栏： ML 文章标签： python pytorch

本文链接：https://blog.csdn.net/qq_45589658/article/details/106951555

版权

ML 专栏收录该内容

7 篇文章 1 订阅

订阅专栏

1.输出的维度和数据集的label大小不一致

RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed.  at C:\w\1\s\tmp_conda_3.7_055457\conda\conda-bld\pytorch_1565416617654\work\aten\src\THNN/generic/ClassNLLCriterion.c:94

检查最后输出的维度需要与类的数量是否相匹配，如数据集有100类，匹配的输出是64*100维度，而自己程序的输出是64*20维度的张量。

2.broken pipe

BrokenPipeError: [Errno 32] Broken pipe

在jupyter后台报的错

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "F:\Anaconda\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "F:\Anaconda\lib\multiprocessing\spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
  File "F:\Anaconda\lib\site-packages\torch\__init__.py", line 81, in <module>
    from torch._C import *
ImportError: DLL load failed: 页面文件太小，无法完成操作。
[I 23:18:29.823 NotebookApp] Saving file at /MODEL/main.ipynb
[I 23:20:22.534 NotebookApp] Kernel interrupted: 60eaf290-06e5-419c-9f63-7d7ea486dd9a

看了一下是开的任务太多了，占了95%的CPU。关掉占用CPU高的任务再重新运行就好了

3.类型不匹配

RuntimeError: Expected object of scalar type Long but got scalar type Float for argument #2 'target'

4.torch.log()不支持使用Long类型的tensor参与运算

RuntimeError: log_vml_cpu not implemented for 'Long'

需要把Long类型的tensor转为浮点型：torch.tensor(x, dtype=torch.float)

5.在定义loss的时候出现的错误
一开始是这样写的（起名困难症）
在这里插入图片描述
然后报了下面的错误

RuntimeError: bool value of Tensor with more than one value is ambiguous

当时写的急了，要写成为这样才对呀
在这里插入图片描述 6.使用了print语句但是不立即打印，而是执行结束后才一起打印

有两个解决方法：
1）对于python 3版本，可以设置参数。

print("output", flush=True)

2）运行的时候加上-u，即不使用缓冲的方式输出

python -u train.py

7.读取文件提示编码错误

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

将原来的open('/data/cifar-100-python/meta', 'r')改为：

with open('/data/cifar-100-python/meta', 'rb') as f1:
    data = f1.read()
    print(data)

8.把dict保存成npy文件以及读取dict方式

这样保存没有问题

infodict = {'train_num': 20, 'test_num': 20}
np.save('./save/data_divide_info.npy', infodict)

在读取的时候，直接使用a['train_num']读取会报以下错误：

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

改成这样读取就可以了：

import numpy as np
filepath = 'xxx/info.npy'
a = np.load(filepath)
print("dict = ", a.item())
print("dict['train_num'] = ", a.item()['train_num'])

9.关于参数顺序

SyntaxError: non-default argument follows default argument

错误原因是，把没有默认值的参数放在了具有默认值的参数后面了。解决方法是调换参数顺序。

10.关于CUDA与cpu的错误

RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _thnn_nll_loss_forward

检查模型的输出，本应该是cuda，结果发现是cpu

print(outputs.device)

后来把模型的输出（outputs）加上.cuda()语句，报错变成这样了：

RuntimeError: CUDA error: an illegal memory access was encountered

最后检查了一下，发现是模型没有放到CUDA上，加上这句就好了

clsnet = nn.DataParallel(clsnet).cuda()

11.关于属性与关键字重复
一个变量a本身是列表类型，但使用type(a)输出其类型却报以下错误：

TypeError: 'str' object is not callable

最终发现是因为在这个类当中，有一个叫做type的str类型的参数，而type本身是python的一个关键字

def __init__(self, type, modeltype):
	if type == 'train':
		xxx
	elif tyoe == 'test':
		xxx
	return True

因此把参数type改个名字就解决了。

12.在调用模型的某个自定义方法时报错
model.printParameter()

AttributeError: 'DataParallel' object has no attribute ‘printParameter’

原因是在定义model时加上了model = nn.DataParallel(model)这样一句，所以需要这样使用：model.module.printParameter()，就不报错了。

13.在test函数里报错，而在train函数是正常的

IndexError: dimension specified as 0 but tensor has no dimensions

这是因为我的测试集共有601条数据，设置的测试的batch size为100。因此会剩下一条数据单独作为一个batch，label的维度为(1,1)。在进行squeeze操作的时候，向量会被压缩成[]，也就是tensor has no dimensions的原因了。解决办法是修改测试的batch size，使得数据量除以batch size余数不为1即可。
补充：训练集的同样错误报错如下：

---> for lb_idx in range(len(labels)):
TypeError: len() of a 0-d tensor

tju_tonge

关注

1
点赞
踩
4

收藏

觉得还不错? 一键收藏
3
评论
Pytorch中报过的错及解决方法（持续更新中）

1.输出的维度和数据集的label大小不一致RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed. at C:\w\1\s\tmp_conda_3.7_055457\conda\conda-bld\pytorch_1565416617654\work\aten\src\THNN/generic/ClassNLLCriterion.c:94检查最后输出的维度需要与类的数量是否相匹
复制链接

扫一扫