Distributed training error on Nuscene Dataset

报错信息

Traceback (most recent call last):
File “./tools/train.py”, line 261, in
main()
File “./tools/train.py”, line 250, in main
custom_train_model(
File “/hy-tmp/mmdetection3d-1.0.0rc6/OccNet/projects/mmdet3d_plugin/bevformer/apis/train.py”, line 27, in custom_train_model
custom_train_detector(
File “/hy-tmp/mmdetection3d-1.0.0rc6/OccNet/projects/mmdet3d_plugin/bevformer/apis/mmdet_train.py”, line 199, in custom_train_detector
runner.run(data_loaders, cfg.workflow)
File “/usr/local/miniconda3/envs/occ/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py”, line 136, in run
epoch_runner(data_loaders[i], **kwargs)
File “/usr/local/miniconda3/envs/occ/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py”, line 49, in train
for i, data_batch in enumerate(self.data_loader):
File “/usr/local/miniconda3/envs/occ/lib/python3.8/site-packages/torch/utils/data/dataloader.py”, line 442, in iter
return self._get_iterator()
File “/usr/local/miniconda3/envs/occ/lib/python3.8/site-packages/torch/utils/data/dataloader.py”, line 388, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File “/usr/local/miniconda3/envs/occ/lib/python3.8/site-packages/torch/utils/data/dataloader.py”, line 1043, in init
w.start()
File “/usr/local/miniconda3/envs/occ/lib/python3.8/multiprocessing/process.py”, line 121, in start
self._popen = self._Popen(self)
File “/usr/local/miniconda3/envs/occ/lib/python3.8/multiprocessing/context.py”, line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File “/usr/local/miniconda3/envs/occ/lib/python3.8/multiprocessing/context.py”, line 284, in _Popen
return Popen(process_obj)
File “/usr/local/miniconda3/envs/occ/lib/python3.8/multiprocessing/popen_spawn_posix.py”, line 32, in init
super().init(process_obj)
File “/usr/local/miniconda3/envs/occ/lib/python3.8/multiprocessing/popen_fork.py”, line 19, in init
self._launch(process_obj)
File “/usr/local/miniconda3/envs/occ/lib/python3.8/multiprocessing/popen_spawn_posix.py”, line 47, in _launch
reduction.dump(process_obj, fp)
File “/usr/local/miniconda3/envs/occ/lib/python3.8/multiprocessing/reduction.py”, line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle ‘dict_keys’ object

缺少了dict_keys 就是DDP识别不了 多卡,通过github上面的分享

将这个代码torch.multiprocessing.set_start_method('fork') 加到train.py 里面就ok了

if __name__ == '__main__':
    torch.multiprocessing.set_start_method('fork')
    main()

torch.multiprocessing.set_start_method('fork')语句用于multiprocessing在PyTorch中使用该模块时指定子进程的启动方法。 startfork方法是基于 Unix 的系统的默认方法,通常被认为是生成子进程的最有效的启动方法。

当使用forkstart方法时,父进程在内存中创建自己的新副本(fork),子进程从与父进程相同的内存空间开始执行。这意味着子进程可以访问与父进程相同的所有变量和数据结构,这在某些情况下可以提高性能。

然而,启动方法也有一些限制fork。例如,forkstart方法不能在Windows系统上使用,如果子进程尝试使用某些非线程安全的库,它也会导致问题。

一般来说,fork启动方法对于大多数用例来说是一个不错的选择,但重要的是要意识到它的局限性。如果您不确定使用哪种启动方法,您始终可以使用默认的spawn启动方法,这种方法更便携,但效率较低。

  • 12
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值