整理pytorch报错

最新推荐文章于 2024-06-01 18:32:06 发布

justtoomuchforyou

最新推荐文章于 2024-06-01 18:32:06 发布

阅读量8.8k

点赞数 1

分类专栏： PyTorch neural network python

本文链接：https://blog.csdn.net/m0_37663482/article/details/104741857

版权

python 同时被 3 个专栏收录

11 篇文章 0 订阅

订阅专栏

PyTorch

8 篇文章 0 订阅

订阅专栏

neural network

2 篇文章 0 订阅

订阅专栏

element 0 of tensors does not require grad and does not have a grad_fn

Stack Overflow[1]中有这个问题的描述，自己写了一个dice loss，没法反向传播，报这个错。原因应该是因为用了torch.argmax()，好像这个函数不可导，所以没法反向传播。input.size()是[10,2,513,513]，target.size()是[10,513,513]，最后写了一个multi-class dice loss，可以了。

    def diceloss(self, output, target):
        predicted = F.softmax(output, dim=1)  #
        # print(np.sum(predicted[:, 0]==0),np.sum(predicted[:, 0]>0),np.sum(predicted[:, 1]==0),np.sum(predicted[:, 1]>0))
        a = dice_coef(predicted[:, 0], (target == 0).float())
        b = dice_coef(predicted[:, 1], (target == 1).float())
        return 1 - (a*0.3 + b*0.7)

RuntimeError: invalid argument 0: Tensors must have same number of dimensions: got 3 and 4 at ../aten/src/TH/generic/THTensor.cpp:702

在Dataloader的时候报了这个错，因为mask数据集中的黑白.bmp文件混入了一些三通道文件，用matplotlib.pyplot读出来，验证发现每个通道值好像一样

for name in lines:
    img = plt.imread('/home/cmz/dataset/CVC-EndoSceneStill/mask/'+name+'.bmp')
    print(img.shape)
    if img.shape.__len__()!=2:
        im = np.array(img)
        print(im[0].shape)
        print(np.sum(im[0]>0), np.sum(im[0]==0))
        print(np.sum(im[1]>0), np.sum(im[1]==0))
        print(np.sum(im[2]>0), np.sum(im[1]==0))

        cv2.imwrite('/home/cmz/dataset/CVC-EndoSceneStill/mask/'+name+'.bmp', img)
        img = plt.imread('/home/cmz/dataset/CVC-EndoSceneStill/mask/' + name + '.bmp')
        print('after: ', img.shape) # 这个方法没用，shape不变

读出来，随便取第三个通道的im转换为Image，重写，可以了。

img = plt.imread(name+str(i)+'.bmp')
    print(img.shape)
    im = np.array(img)
    print(im[:,:,0].shape)
    new = im[:,:,2]
    print(new.shape)
    new = Image.fromarray(new)
    new.save(name+str(i)+'.bmp')
    img = plt.imread(name+str(i)+'.bmp')
    print('after: ', img.shape)

expected backend CPU and dtype Float but got backend CPU and dtype Long

期望得到的变量是CPU上Float型，却得到了CPU上Long型，所以改。类型问题CUDA上可以直接Tensor.float()，Tensor.long()转，CPU上就numpy/ndarray.astype('float32')。

ValueError: Type must be a sub-type of ndarray type

不记得了，应该是哪里应该用numpy却用了Tensor

a leaf Variable that requires grad has been used in an in-place operation.

因为写的dice loss一直不对又找不到原因，以为是初始化的loss不能写loss=0要写

torch.tensor([0.], requires_grad=True)，结果还是不对，报了这个错

numpy把多个array合并成一个

想法是把3个shape(4,4)的ndarray合并成(3,4,4)这样，用np.vstack可以，数组的拼接分割操作参考Python之Numpy数组拼接，组合，连接

TypeError: sum() received an invalid combination of arguments - got (out=NoneType,pe, ), but expected one of:

* ()
* (torch.dtype dtype)
* (tuple of ints dim, torch.dtype dtype)
didn't match because some of the keywords were incorrect: out, axis
* (tuple of ints dim, bool keepdim, torch.dtype dtype)
* (tuple of ints dim, bool keepdim)
didn't match because some of the keywords were incorrect: out, axis

说torch.sum调用的不对，不知道为什么，单独试了一下：

>>> a
tensor([[-0.9756, -0.1566, -0.3777,  0.2898],
        [ 0.2670, -1.2035, -1.1633,  0.7887],
        [ 0.9774, -0.2306,  0.0794, -1.3420],
        [ 1.6269, -0.0125,  0.2716,  0.3390]])
>>> a.sum(1)
tensor([-1.2202, -1.3110, -0.5160,  2.2250])
>>> a.sum(axis=1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sum() received an invalid combination of argument
s - got (axis=int, ), but expected one of:
 * ()
      didn't match because some of the keywords were incorre
ct: axis
 * (torch.dtype dtype)
      didn't match because some of the keywords were incorre
ct: axis
 * (tuple of ints dim, torch.dtype dtype)
 * (tuple of ints dim, bool keepdim, torch.dtype dtype)
 * (tuple of ints dim, bool keepdim)

>>> a.sum(dim=1)
tensor([-1.2202, -1.3110, -0.5160,  2.2250])
>>> torch.sum(a,dim=1)
tensor([-1.2202, -1.3110, -0.5160,  2.2250])
>>>

这里可以，但是Tensor.sum(dim=1)的写法在程序里一直报如上错，必须用torch.sum()。还有一些问题在这里1

from torch._C import *ImportError: DLL load failed: 找不到指定的模块。

numpy版本导致，从1.17.1降到1.16不行，1.15.1可以，参考2

无法import skimage或者sklearn

可能是numpy版本问题，升级sk或者降级numpy到1.14

process 1 terminated with exit code 1

在styleGAN2代码里设置使用多GPU时出现的问题，multi-gpu设置False报错消失，没仔细研究

TYPEERROR: CAUGHT TYPEERROR IN REPLICA 1 ON DEVICE 1.

TYPEERROR: FORWARD() MISSING 1 REQUIRED POSITIONAL ARGUMENT: 'INPUT'

模型找不到传入的参数，因为使用了多GPU，训练的时候模型和数据不在一个GPU上，设置batch_size大小为使用的GPU数量的整数倍解决

关于优化器torch.optim.SGD

一个刚创建好的优化器实例optimizer：

optimizer.state内容：defaultdict(<class 'dict'>, {})
optimizer.param_groups内容：[{'params': [], 'lr': 0.01, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0.0005, 'nesterov': False, 'initial_lr': 0.01}]

params里面是传入的参数，默认requires_grad=True，optimizer.param_groups里面的参数值可以自己设置，比如设置为weight，那么optimizer.param_groups[0]['params']里面就有weight，此外，optimizer.state[weight]['momentum_buffer']的维度必须与这个weight的维数相同，这个momentum_buffer的值也可以自己设置。

设置模块的可学习参数用nn.Parameter后仍然要独立开辟一块内存保存

比如设置了参数列表weight=[nn.Parameter(torch.normal(0, 0.02, (2,2)))] * 10，weight中每个parameter都要用setattr单独保存为模块参数，才能被优化器追踪到，用torch.Tensor.__deepcopy__()没用。

justtoomuchforyou

关注

1
点赞
踩
6

收藏

觉得还不错? 一键收藏
8
评论
整理pytorch报错

Stack Overflow[1]中有这个问题的描述，自己写了一个dice loss，没法反向传播，报这个错。原因应该是因为用了torch.argmax()，好像这个函数不可导，所以没法反向传播。但是...
复制链接

扫一扫

专栏目录