element 0 of tensors does not require grad and does not have a grad_fn
Stack Overflow[1]中有这个问题的描述,自己写了一个dice loss,没法反向传播,报这个错。原因应该是因为用了torch.argmax(),好像这个函数不可导,所以没法反向传播。input.size()是[10,2,513,513],target.size()是[10,513,513],最后写了一个multi-class dice loss,可以了。
def diceloss(self, output, target):
predicted = F.softmax(output, dim=1) #
# print(np.sum(predicted[:, 0]==0),np.sum(predicted[:, 0]>0),np.sum(predicted[:, 1]==0),np.sum(predicted[:, 1]>0))
a = dice_coef(predicted[:, 0], (target == 0).float())
b = dice_coef(predicted[:, 1], (target == 1).float())
return 1 - (a*0.3 + b*0.7)
RuntimeError: invalid argument 0: Tensors must have same number of dimensions: got 3 and 4 at ../aten/src/TH/generic/THTensor.cpp:702
在Dataloader的时候报了这个错,因为mask数据集中的黑白.bmp文件混入了一些三通道文件,用matplotlib.pyplot读出来,验证发现每个通道值好像一样
for name in lines:
img = plt.imread('/home/cmz/dataset/CVC-EndoSceneStill/mask/'+name+'.bmp')
print(img.shape)
if img.shape.__len__()!=2:
im = np.array(img)
print(im[0].shape)
print(np.sum(im[0]>0), np.sum(im[0]==0))
print(np.sum(im[1]>0), np.sum(im[1]==0))
print(np.sum(im[2]>0), np.sum(im[1]==0))
cv2.imwrite('/home/cmz/dataset/CVC-EndoSceneStill/mask/'+name+'.bmp', img)
img = plt.imread('/home/cmz/dataset/CVC-EndoSceneStill/mask/' + name + '.bmp')
print('after: ', img.shape) # 这个方法没用,shape不变
读出来,随便取第三个通道的im转换为Image,重写,可以了。
img = plt.imread(name+str(i)+'.bmp')
print(img.shape)
im = np.array(img)
print(im[:,:,0].shape)
new = im[:,:,2]
print(new.shape)
new = Image.fromarray(new)
new.save(name+str(i)+'.bmp')
img = plt.imread(name+str(i)+'.bmp')
print('after: ', img.shape)
expected backend CPU and dtype Float but got backend CPU and dtype Long
期望得到的变量是CPU上Float型,却得到了CPU上Long型,所以改。类型问题CUDA上可以直接Tensor.float(),Tensor.long()转,CPU上就numpy/ndarray.astype('float32')。
ValueError: Type must be a sub-type of ndarray type
不记得了,应该是哪里应该用numpy却用了Tensor
a leaf Variable that requires grad has been used in an in-place operation.
因为写的dice loss一直不对又找不到原因,以为是初始化的loss不能写loss=0要写
torch.tensor([0.], requires_grad=True),结果还是不对,报了这个错
numpy把多个array合并成一个
想法是把3个shape(4,4)的ndarray合并成(3,4,4)这样,用np.vstack可以,数组的拼接分割操作参考Python之Numpy数组拼接,组合,连接
TypeError: sum() received an invalid combination of arguments - got (out=NoneType,pe, ), but expected one of:
* ()
* (torch.dtype dtype)
* (tuple of ints dim, torch.dtype dtype)
didn't match because some of the keywords were incorrect: out, axis
* (tuple of ints dim, bool keepdim, torch.dtype dtype)
* (tuple of ints dim, bool keepdim)
didn't match because some of the keywords were incorrect: out, axis
说torch.sum调用的不对,不知道为什么,单独试了一下:
>>> a
tensor([[-0.9756, -0.1566, -0.3777, 0.2898],
[ 0.2670, -1.2035, -1.1633, 0.7887],
[ 0.9774, -0.2306, 0.0794, -1.3420],
[ 1.6269, -0.0125, 0.2716, 0.3390]])
>>> a.sum(1)
tensor([-1.2202, -1.3110, -0.5160, 2.2250])
>>> a.sum(axis=1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: sum() received an invalid combination of argument
s - got (axis=int, ), but expected one of:
* ()
didn't match because some of the keywords were incorre
ct: axis
* (torch.dtype dtype)
didn't match because some of the keywords were incorre
ct: axis
* (tuple of ints dim, torch.dtype dtype)
* (tuple of ints dim, bool keepdim, torch.dtype dtype)
* (tuple of ints dim, bool keepdim)
>>> a.sum(dim=1)
tensor([-1.2202, -1.3110, -0.5160, 2.2250])
>>> torch.sum(a,dim=1)
tensor([-1.2202, -1.3110, -0.5160, 2.2250])
>>>
这里可以,但是Tensor.sum(dim=1)的写法在程序里一直报如上错,必须用torch.sum()。还有一些问题在这里1
from torch._C import *ImportError: DLL load failed: 找不到指定的模块。
numpy版本导致,从1.17.1降到1.16不行,1.15.1可以,参考2
无法import skimage或者sklearn
可能是numpy版本问题,升级sk或者降级numpy到1.14
process 1 terminated with exit code 1
在styleGAN2代码里设置使用多GPU时出现的问题,multi-gpu设置False报错消失,没仔细研究
TYPEERROR: CAUGHT TYPEERROR IN REPLICA 1 ON DEVICE 1.
TYPEERROR: FORWARD() MISSING 1 REQUIRED POSITIONAL ARGUMENT: 'INPUT'
模型找不到传入的参数,因为使用了多GPU,训练的时候模型和数据不在一个GPU上,设置batch_size大小为使用的GPU数量的整数倍解决
关于优化器torch.optim.SGD
一个刚创建好的优化器实例optimizer:
optimizer.state内容:defaultdict(<class 'dict'>, {}) optimizer.param_groups内容:[{'params': [], 'lr': 0.01, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0.0005, 'nesterov': False, 'initial_lr': 0.01}]
params里面是传入的参数,默认requires_grad=True,optimizer.param_groups里面的参数值可以自己设置,比如设置为weight,那么optimizer.param_groups[0]['params']里面就有weight,此外,optimizer.state[weight]['momentum_buffer']的维度必须与这个weight的维数相同,这个momentum_buffer的值也可以自己设置。
设置模块的可学习参数用nn.Parameter后仍然要独立开辟一块内存保存
比如设置了参数列表weight=[nn.Parameter(torch.normal(0, 0.02, (2,2)))] * 10,weight中每个parameter都要用setattr单独保存为模块参数,才能被优化器追踪到,用torch.Tensor.__deepcopy__()没用。