pytorch各种报错个人笔记

_less is more

已于 2024-03-27 07:12:46 修改

阅读量1k

点赞数

分类专栏： Deep Learning 文章标签： pytorch 深度学习 python

于 2022-07-05 00:58:54 首次发布

本文链接：https://blog.csdn.net/weixin_42815846/article/details/125611520

版权

Deep Learning 专栏收录该内容

18 篇文章 2 订阅

订阅专栏

1、one of the variables needed for gradient computation has been modified by an inplace operation

可能有多种原因
1、以 ‘_’ 结尾的函数
2、+=、/=这样的操作
3、激活函数如 torch.nn.ReLU(inplace=True)

如果是第二种，把 a += b 换成 a = a + b 即可，第三种则置inplace为False；若是第一种则麻烦一点，需要找替代函数或者自己实现该函数的功能

2、An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

在使用torch.multiprocessing.spawn时报此错。pytorch的torch.multiprocessing是基于python的原生multiprocessing进行二次开发的模块，在multiprocessing中，当在另一个CPU核心上产生一个子进程时，会将原来的主py文件再次导入，因此，在如下示例代码中

# start process for each gpu
mp.spawn(main, nprocs=args.g, args=(args,))

当其被加载至另一个CPU核心运行时，此语句会被再执行一遍，因此会递归产生大量子进程，最后导致该报错。因此此语句应该加入如下代码后进行运行：

if __name__ == '__main__':
    # start process for each gpu
    mp.spawn(main, nprocs=args.g, args=(args,))

这样子进程就不会再运行mp.spawn了

3、注：to_tensor会自动把输入图像从0-255转化成0-1的范围

官方文档见：https://pytorch.org/vision/stable/generated/torchvision.transforms.ToTensor.html#torchvision.transforms.ToTensor

4、如果模型中有dropout或者batchnorm，在test的时候（包括从文件恢复模型），一定要加eval()，因为这两个模块在train和test的行为是不一样的，不加的话torch会自动当作train进行处理，会使得test的效果变差

5、NotImplementedError: Could not run 'aten::slow_conv3d_forward' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::slow_conv3d_forward' is only available for these backends: [CPU, BackendSelect, Python, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradLazy, AutogradXPU, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, AutocastCPU, Autocast, Batched, VmapMode, Functionalize].

在我的情况里面实际是因为数据没有用.to(device)弄到cuda上

_less is more

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
pytorch各种报错个人笔记

可能有多种原因1、以 ‘_’ 结尾的函数2、+=、/=这样的操作3、激活函数如 torch.nn.ReLU(inplace=True)如果是第二种，把 a += b 换成 a = a + b 即可，第三种则置inplace为False；若是第一种则麻烦一点，需要找替代函数或者自己实现该函数的功能......
复制链接

扫一扫