一、使用multiprocessing问题汇总
1.出现不能分配内存错误:
fork OSError: [Errno 12] Cannot allocate memory
解决办法:
重启电脑;
或者换用spawn启动方式
添加代码:
mp.set_start_method('spawn')
2.在多线程中保存的pytorch模型无法读取,报错如下:
在这里插入代码片
这是因为multiprocessing的共享数据类型不能保存到pytorch模型中
二、运算报错
1.莫名其妙的out和self在cpu上
return F.linear(input, self.weight, self.bias)
File "/home/dq/anaconda2/envs/dq/lib/python3.6/site-packages/torch/nn/functional.py", line 1753, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: Tensor for 'out' is on CPU, Tensor for argument #1 'self' is on CPU, but expected them to be on GPU (while checking arguments for addmm)
python-BaseException
一个成功过的解决方法:
如果某个网络是包含在数组中创建的,遍历该数组的网络来前向传播就会出这样的错,把数组换成sequential就没事。
出错点:
self.q_outs = [init_(nn.Linear(self.hidden_size, 1)) for _ in range(num_q_outs)]
修改:
self.q_outs = nn.Sequential(*[init_(nn.Linear(self.hidden_size, 1)) for _ in range(num_q_outs)])
三、画图工具wandb问题解决记录
1.报错如下:
Traceback (most recent call last):
File "train/train_mpe.py", line 6, in <module>
import wandb
File "/home2/dengqi/.local/lib/python3.6/site-packages/wandb/__init__.py", line 32, in <module>
from wandb import sdk as wandb_sdk
File "/home2/dengqi/.local/lib/python3.6/site-packages/wandb/sdk/__init__.py", line 12, in <module>
from .wandb_init import _attach, init # noqa: F401
File "/home2/dengqi/.local/lib/python3.6/site-packages/wandb/sdk/wandb_init.py", line 35, in <module>
from .backend.backend import Backend
File "/home2/dengqi/.local/lib/python3.6/site-packages/wandb/sdk/backend/backend.py", line 20, in <module>
from ..interface.interface import InterfaceBase
File "/home2/dengqi/.local/lib/python3.6/site-packages/wandb/sdk/interface/interface.py", line 18, in <module>
from wandb.proto import wandb_internal_pb2 as pb
File "/home2/dengqi/.local/lib/python3.6/site-packages/wandb/proto/wandb_internal_pb2.py", line 5, in <module>
from google.protobuf import descriptor as _descriptor
File "/home2/dengqi/.local/lib/python3.6/site-packages/google/protobuf/descriptor.py", line 47, in <module>
from google.protobuf.pyext import _message
AttributeError: module 'google.protobuf.internal.containers' has no attribute 'MutableMapping'
training is done!
或者
Traceback (most recent call last):
File "train/train_mpe.py", line 6, in <module>
import wandb
File "/home2/dengqi/.local/lib/python3.6/site-packages/wandb/__init__.py", line 32, in <module>
from wandb import sdk as wandb_sdk
File "/home2/dengqi/.local/lib/python3.6/site-packages/wandb/sdk/__init__.py", line 12, in <module>
from .wandb_init import _attach, init # noqa: F401
File "/home2/dengqi/.local/lib/python3.6/site-packages/wandb/sdk/wandb_init.py", line 35, in <module>
from .backend.backend import Backend
File "/home2/dengqi/.local/lib/python3.6/site-packages/wandb/sdk/backend/backend.py", line 20, in <module>
from ..interface.interface import InterfaceBase
File "/home2/dengqi/.local/lib/python3.6/site-packages/wandb/sdk/interface/interface.py", line 18, in <module>
from wandb.proto import wandb_internal_pb2 as pb
File "/home2/dengqi/.local/lib/python3.6/site-packages/wandb/proto/wandb_internal_pb2.py", line 15, in <module>
from wandb.proto import wandb_base_pb2 as wandb_dot_proto_dot_wandb__base__pb2
File "/home2/dengqi/.local/lib/python3.6/site-packages/wandb/proto/wandb_base_pb2.py", line 21, in <module>
create_key=_descriptor._internal_create_key,
AttributeError: module 'google.protobuf.descriptor' has no attribute '_internal_create_key'
training is done!
解决办法:
pip3 install protobuf==3.15.7