共享内存问题:unable to open shared memory object </torch_> in read-write mode
使用NAS,网络太大,一块放不下,所以尝试用ddp玩一个多gpu训练。
(py36torch15) xx@cluster:~/wang/FasterCrowdCountingNAS/FBNetBranch$ python main.py
/home//anaconda3/envs/py36torch15/lib/python3.6/site-packages/pytorch_lightning/utilities/distributed.py:23: UserWarning: You requested multiple GPUs but did not specify a backend, e.g. Trainer(distributed_backend=dp) (or ddp, ddp2). Setting distributed_backend=ddp for you.
warnings.warn(*args, **kwargs)
GPU available: True, used: True
No environment variable for node rank defined. Set as 0.
CUDA_VISIBLE_DEVICES: [0,1,2]
Traceback (most recent call last):
File "main.py", line 29, in <module>
File "/home//anaconda3/envs/py36torch15/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 844, in fit
File &#