You may need to install ‘nccl2‘ from NVIDIA official website

报错信息

在使用paddle进行多卡训练的时候报错,报错信息如下

W0111 17:25:32.685145 56257 dynamic_loader.cc:207] You may need to install ‘nccl2’ from NVIDIA official website: https://developer.nvidia.com/nccl/nccl-downloadbefore install PaddlePaddle.
Traceback (most recent call last):
File “tools/train.py”, line 114, in
main(config, device, logger, vdl_writer)
File “tools/train.py”, line 47, in main
dist.init_parallel_env()
File “/home/disk0/zw/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/distributed/parallel.py”, line 181, in init_parallel_env
parallel_helper._init_parallel_ctx()
File “/home/disk0/zw/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/parallel_helper.py”, line 42, in _init_parallel_ctx
parallel_ctx__clz.init()
RuntimeError: (PreconditionNotMet) The third-party dynamic library (libnccl.so) that Paddle depends on is not configured correctly. (error code is libnccl.so: cannot open shared object file: No such file or directory)
Suggestions:

  1. Check if the third-party dynamic library (e.g. CUDA, CUDNN) is installed correctly and its version is matched with paddlepaddle you installed.
  2. Configure third-party dynamic library environment variables as follows:
  • Linux: set LD_LIBRARY_PATH by export LD_LIBRARY_PATH=...
  • Windows: set PATH by set PATH=XXX; (at /paddle/paddle/fluid/platform/dynload/dynamic_loader.cc:234) [Hint: If you need C++ stacktraces for debugging, please setFLAGS_call_stack_level=2`.]

分析原因

环境信息
  • python:3.7
  • cuda:10.0
  • cudnn:7.6
  • paddlepaddle-gpu:2.0.0rc1

通过上面的错误可以很容易定位到是因为没有找到libnccl.so导致的这个问题,所以导致这个错误有两种原因:

  1. 没有安装nccl
  2. 没有将libnccl.so添加到LD_LIBRARY_PATH环境变量中

解决办法

安装nccl

根据cuda的版本去选择对应版本的nccl,可以去nvidia的官网下载https://developer.nvidia.com/nccl/nccl-legacy-downloads
这里以cuda10为例

1.下载nccl-repo-ubuntu1604-2.6.4-ga-cuda10.0_1-1_amd64.deb
2.安装镜像库
sudo dpkg -i nccl-repo-ubuntu1604-2.6.4-ga-cuda10.0_1-1_amd64.deb
3.更新源镜像
sudo apt update
4.安装nccl
sudo apt install 
libnccl2=2.6.4-1+cuda10.0 libnccl-dev=2.6.4-1+cuda10.0
将nccl添加到环境变量中

nccl默认的安装目录是/usr/lib/x86_64-linux-gnu,修改~/.bashrc文件,添加如下内容到文件中

#设置cuda库的目录
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64
#将nccl添加到LD_LIBRARY_PATH中
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu

添加好之后保存文件,使用source ~/.bashrc让文件的配置生效,在通过echo $LD_LIBRARY_PATH查看环境变量设置是否成功,配置成功之后输出的信息如下

/usr/local/cuda-10.0/lib64:/usr/lib/x86_64-linux-gnu

参考:

  1. https://forums.developer.nvidia.com/t/have-strange-problem-on-installing-nccl/60654
  2. https://zhuanlan.zhihu.com/p/174710896
  3. https://github.com/PaddlePaddle/PaddleDetection/issues/1444
  4. https://developer.nvidia.com/nccl/nccl-legacy-downloads
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

修炼之路

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值