报错展示:
Traceback (most recent call last):
File "train_seg_luad.py", line 291, in <module>
train(cfg=cfg)
File "train_seg_luad.py", line 179, in train
_, segs, attns = wetr(inputs)
File "/home/pengzhang/anaconda3/envs/TPRO/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/data5/pengzhang/TPRO-main/seg_network/model.py", line 51, in forward
seg = self.decoder(_x)
File "/home/pengzhang/anaconda3/envs/TPRO/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/data5/pengzhang/TPRO-main/seg_network/segformer_head.py", line 135, in forward
x = self.linear_pred(x)
File "/home/pengzhang/anaconda3/envs/TPRO/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/pengzhang/anaconda3/envs/TPRO/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 447, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/pengzhang/anaconda3/envs/TPRO/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 443, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Unable to find a valid cuDNN algorithm to run convolution
解决方法:
这个错误的原因是咱们的GPU存储不足了。我的程序需要6G的GPU空间,一开始我使用的是显卡0,显卡0还剩余4G左右的空间,显然不够我的程序需要,所以就出现了上面的错误,在我换到显卡3(显卡3还剩余10G左右)以后,程序就可以正常执行了。