pytorch踩坑记录:

一、使用multiprocessing问题汇总
1.出现不能分配内存错误:

fork OSError: [Errno 12] Cannot allocate memory

解决办法:
重启电脑;
或者换用spawn启动方式
添加代码:

    mp.set_start_method('spawn')

2.在多线程中保存的pytorch模型无法读取,报错如下:

在这里插入代码片

这是因为multiprocessing的共享数据类型不能保存到pytorch模型中

二、运算报错
1.莫名其妙的out和self在cpu上

return F.linear(input, self.weight, self.bias)
  File "/home/dq/anaconda2/envs/dq/lib/python3.6/site-packages/torch/nn/functional.py", line 1753, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: Tensor for 'out' is on CPU, Tensor for argument #1 'self' is on CPU, but expected them to be on GPU (while checking arguments for addmm)
python-BaseException

一个成功过的解决方法:
如果某个网络是包含在数组中创建的,遍历该数组的网络来前向传播就会出这样的错,把数组换成sequential就没事。
出错点:

self.q_outs = [init_(nn.Linear(self.hidden_size, 1)) for _ in range(num_q_outs)]

修改:

 self.q_outs = nn.Sequential(*[init_(nn.Linear(self.hidden_size, 1)) for _ in range(num_q_outs)])

三、画图工具wandb问题解决记录
1.报错如下:

Traceback (most recent call last):
  File "train/train_mpe.py", line 6, in <module>
    import wandb
  File "/home2/dengqi/.local/lib/python3.6/site-packages/wandb/__init__.py", line 32, in <module>
    from wandb import sdk as wandb_sdk
  File "/home2/dengqi/.local/lib/python3.6/site-packages/wandb/sdk/__init__.py", line 12, in <module>
    from .wandb_init import _attach, init  # noqa: F401
  File "/home2/dengqi/.local/lib/python3.6/site-packages/wandb/sdk/wandb_init.py", line 35, in <module>
    from .backend.backend import Backend
  File "/home2/dengqi/.local/lib/python3.6/site-packages/wandb/sdk/backend/backend.py", line 20, in <module>
    from ..interface.interface import InterfaceBase
  File "/home2/dengqi/.local/lib/python3.6/site-packages/wandb/sdk/interface/interface.py", line 18, in <module>
    from wandb.proto import wandb_internal_pb2 as pb
  File "/home2/dengqi/.local/lib/python3.6/site-packages/wandb/proto/wandb_internal_pb2.py", line 5, in <module>
    from google.protobuf import descriptor as _descriptor
  File "/home2/dengqi/.local/lib/python3.6/site-packages/google/protobuf/descriptor.py", line 47, in <module>
    from google.protobuf.pyext import _message
AttributeError: module 'google.protobuf.internal.containers' has no attribute 'MutableMapping'
training is done!

或者

Traceback (most recent call last):
  File "train/train_mpe.py", line 6, in <module>
    import wandb
  File "/home2/dengqi/.local/lib/python3.6/site-packages/wandb/__init__.py", line 32, in <module>
    from wandb import sdk as wandb_sdk
  File "/home2/dengqi/.local/lib/python3.6/site-packages/wandb/sdk/__init__.py", line 12, in <module>
    from .wandb_init import _attach, init  # noqa: F401
  File "/home2/dengqi/.local/lib/python3.6/site-packages/wandb/sdk/wandb_init.py", line 35, in <module>
    from .backend.backend import Backend
  File "/home2/dengqi/.local/lib/python3.6/site-packages/wandb/sdk/backend/backend.py", line 20, in <module>
    from ..interface.interface import InterfaceBase
  File "/home2/dengqi/.local/lib/python3.6/site-packages/wandb/sdk/interface/interface.py", line 18, in <module>
    from wandb.proto import wandb_internal_pb2 as pb
  File "/home2/dengqi/.local/lib/python3.6/site-packages/wandb/proto/wandb_internal_pb2.py", line 15, in <module>
    from wandb.proto import wandb_base_pb2 as wandb_dot_proto_dot_wandb__base__pb2
  File "/home2/dengqi/.local/lib/python3.6/site-packages/wandb/proto/wandb_base_pb2.py", line 21, in <module>
    create_key=_descriptor._internal_create_key,
AttributeError: module 'google.protobuf.descriptor' has no attribute '_internal_create_key'
training is done!

解决办法:

pip3 install protobuf==3.15.7
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值