一、安装部署
1.1 pip方式
pip3 config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
下载地址: https://developer.hpccube.com/ ->资源工具->AI生态包->horovod
• 安装: pip3 install horovod-0.22.1*.whl
• 查看:pip3 list
• 卸载: pip3 uninstall horovod
1.2 docker方式
拉取镜像(光源地址:https://www.sourcefind.cn/ -> 容器镜像)
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch-horovod:1.10.0-centos7.6-dtk-22.10-py37-latest
docker pull image.sourcefind.cn:5000/dcu/admin/base/tensorflow-horovod:1.15.1-centos7.6-dtk-22.10-py37-latest
docker pull image.sourcefind.cn:5000/dcu/admin/base/tensorflow-horovod:2.7.0-centos7.6-dtk-22.10-py37-latest
创建容器
docker run -it --network=host --name=${container_name} --privileged --device=/dev/kfd --device=/dev/dri --ipc=host -
-shm-size=16G --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined ${image_name} bash
启动容器/停止容器
docker start ${container_name} / docker stop ${container_name}
进入容器
docker exec –it ${container_name} /bin/bash
删除容器/镜像
docker rm ${container_name} / docker rmi ${image_name}
验证基础环境
rocminfo | grep Z100