pip安装horovod时出现下面错误:
/usr/bin/ld: cannot find -lnccl_static
collect2: error: ld returned 1 exit status
error: NCCL 2.0 library or its later version was not found (see error above).
Please specify correct NCCL location with the HOROVOD_NCCL_HOME environment variable or combination of HOROVOD_NCCL_INCLUDE and HOROVOD_NCCL_LIB environment variables.
HOROVOD_NCCL_HOME - path where NCCL include and lib directories can be found
HOROVOD_NCCL_INCLUDE - path to NCCL include directory
HOROVOD_NCCL_LIB - path to NCCL lib directory
错误原因:nccl版本太低,找不到libnccl_static.a
解决方案:执行以下命令
echo "deb http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64 /" > /etc/apt/sources.list.d/nvidia-ml.list
apt-get update
apt-get install libnccl2 libnccl-dev