AMD GPU 数据训练 平台全折腾记 (持续更新中,欢迎收藏 转发点赞和投币)
======================================================================================
先说结论:如果你不想折腾,只需要安装好rocm 3.1后直接用我的wheel包安装即可,轮子已经造好,无需重来:
安装环境:
https://github.com/RadeonOpenCompute/ROCm
下载造好的轮子:
https://download.csdn.net/download/znsoft/12246098
下载后,直接
pip install xxxx.whl #(此处的xx.whl是你下载的文件名)
=================================手弱党不用看下面,直接用上面的安装即可===========================
一、结论:
软件环境 :
Ubuntu 18.04, ROCm 3.1, pytorch 1.4, tensorflow 2.1, tensorflow 1.15.2
硬件平台:
intel 6700 64G内存
Radeon VII ( 雷七 希仕版 2020.3.12拿下 ,3529元)
在以上环境下,pytorch, tf1/tf2 测试通过,达到预期性能。
二、折腾篇
准备rocm环境,这个没啥好说的,按amd的官方手册处理:
https://github.com/RadeonOpenCompute/ROCm
三、编译、测试及打包安装
3.1 编译之大坑
编译时会有一个错误,导致编译不下去,大法就是强制类型转换成float 即可。
源码目录下/caffe2/operators/hip/relu_op.h 中会提示出错,见图中,直接在 ?号后面的变量前加(float)即可
如 __floats2half2_rn(xx.x >0.0f ? (float)xx.x :0.0f, ....
四、开干
2安装编译期间用到的依赖库
sudo apt update
sudo apt install rock-dkms rocm-dev rocm-libs miopen-hip miopengemm hipsparse rccl rocthrust hipcub roctracer-dev
sudo apt install git python-pip libopenblas-dev cmake libnuma-dev autoconf build-essential ca-certificates curl libgoogle-glog-dev libhiredis-dev libiomp-dev libleveldb-dev liblmdb-dev libopencv-dev libpthread-stubs0-dev libsnappy-dev sudo vim libprotobuf-dev protobuf-compiler
pip install enum34 numpy pyyaml setuptools typing cffi future hypothesis
3 下载源码:
cd ~
git clone https://github.com/pytorch/pytorch.git
or
git clone https://github.com/ROCmSoftwarePlatform/pytorch.git
or
目前建议获取 下面的库编译。所有的库都会遇到前面的坑。
git clone -b v1.4.0 https://github.com/pytorch/pytorch.git
cd pytorch
git submodule update --init --recursive
4设置必要的环境变量
sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/rocsparse/lib/cmake/rocsparse/rocsparse-config.cmake
sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/rocfft/lib/cmake/rocfft/rocfft-config.cmake
sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/miopen/lib/cmake/miopen/miopen-config.cmake
sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/rocblas/lib/cmake/rocblas/rocblas-config.cmake
sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/hipsparse/lib/cmake/hipsparse/hipsparse-config.cmake
sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/rccl/lib/cmake/rccl/rccl-config.cmake
然后在pytorch/cmake/External/rccl.cmake 里添加set(RCCL_DIR "/opt/rocm/rccl/lib/cmake/rccl")
安装 nccl
https://blog.csdn.net/lwplwf/article/details/82788818
5 Hipify CUDA的函数转化
cd pytorch
python tools/amd_build/build_amd.py
设置环境变量:
export PYTORCH_ROCM_ARCH=gfx906
export HCC_AMDGPU_TARGET=gfx906
编译:
USE_CUDA=OFF USE_ROCM=1 USE_LMDB=1 USE_OPENCV=1 MAX_JOBS=16 python setup.py install --user
测试:
PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py --verbose
造轮子:(保存于/root下)
python setup.py bdist_wheel -d /root/
最后一步就是安装啦
pip install /path/to/your/wheel