本文转载自博客
深度学习学习记录
趁着模型还在训练的间隙,将遇到的小bug记录一下
训练时遇到bug1:UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at ..\c10/core/TensorImpl.h:1156.) return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
解决办法:
这是pytorch1.9的bug,下个版本将修复,我将pytorch降级成1.8就不报这个错了。
遇到bug2:CUDA out of memory. Tried to allocate 392.00 MiB (GPU 0; 23.70 GiB total capacity; 21.56 GiB already allocated; 206.81 MiB free; 21.83 GiB reserved in total by PyTorch)
解决办法:显存占用过大,pytorch解决显存占用过大就是调小batch_size。
遇到bug3:cv2.error: OpenCV(4.1.0)error: (-215:Assertion failed) !ssize.empty() in function 'resize'
解决办法:我用debug排查了好一阵子,最后发现问题出在os.path.join这个函数上,这个函数在windows和Linux下的表现居然是不一致的,windows虽然没有报错但Linux报错了,原因在于windows的路径分隔符’/‘和’\‘都可以,而linux只能用’/’,但这个函数默认的是’\’,导致在linux中报错,解决办法就是在linux中添加一个replace函数将‘\’替换成’/’,比如:
os.path.join(self.root, face).replace('\\','/')