HPL_GPU地址
https://github.com/reger-men/HPL_GPU.git
环境
- rocm 版本: 5.3.22061-e8e78f1a
- 使用 rocm/rocm-terminal 的镜像, 创建容器(版本为最新版)
- UCX
- openMPI
- OSU
- openBLAS
- rocBLAS
- rocrand
环境安装
注意: 所有库的安装路径选择/usr/local. 否则使用 sudo 时, 可能找不到软件
- 安装 UCX, openMPI, OSU
# 安装 UCX
git clone https://github.com/openucx/ucx.git
cd ucx
# git checkout v1.10.x # optional to use v1.10.x branch
sudo ./autogen.sh
mkdir build
cd build
sudo ../contrib/configure-opt --prefix=/usr/local --with-rocm=/opt/rocm --without-knem --without-cuda --enable-gtest --enable-examples
make
make install
# 安装openMPI
git clone --recursive -b v4.1.x https://github.com/open-mpi/ompi.git
cd ompi
./autogen.pl
mkdir build
cd build
../configure --prefix=/usr/local --with-ucx=/usr/local --without-verbs
make
make install
# 安装 OSU
wget http://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-5.7.tar.gz
tar xvf osu-micro-benchmarks-5.7.tar.gz
mv osu-micro-benchmarks-5.7 osu
cd osu
./configure --enable-rocm --with-rocm=/opt/rocm CC=/usr/local/bin/mpicc CXX=/usr/local/bin/mpicxx LDFLAGS="-L/usr/local/lib/ -lmpi -L/opt/rocm/lib/ $(hipconfig -C) -lamdhip64" CPPFLAGS="-std=c++11"
也可根据 https://github.com/openucx/ucx/wiki/Build-and-run-ROCM-UCX-OpenMPI 安装
- 安装 openBLAS
sudo git clone https://github.com/xianyi/OpenBLAS.git
cd OpenBLAS
sudo make TARGET=NEHALEM
sudo make PREFIX=/usr/local install
也可根据 https://blog.csdn.net/weixin_41477306/article/details/99727496 安装
- 安装 rocBLAS, rocrand
sudo apt-get update
sudo apt-get install rocblas
sudo apt-get install rocrand
开始编译原文件
git clone https://github.com/reger-men/HPL_GPU.git
cd HPL-GPU
mkdir build && cd build
sudo cmake .. -DMPI_DIR=/usr/local -DBLAS_DIR=/usr/local
sudo make -j
修改配置文件
- 修改HPL.dat的配置: N任意(最好矩阵空间为内存的80%), P和Q均为1(单GPU), NB 最好为32的倍数. 其他不用调整
- 修改 mpirun_xhplhip.sh 文件
- 修改openMPI的地址
执行HPL_GPU
sudo ./mpirun_xhplhip.sh
可能的问题
- openMPI错误
mpirun: error while loading shared libraries: libopen-rte.so.40: cannot open shared object file: No such file or directory
# 解决
sudo ldconfig
- 安装软件问题
The value of the MCA parameter "plm_rsh_agent" was set to a path
that could not be found:
plm_rsh_agent: ssh : rsh
Please either unset the parameter, or check that the path is correct
# 解决
sudo apt install openssh-server
By not providing "Findrocrand.cmake" in CMAKE_MODULE_PATH this project has
asked CMake to find a package configuration file provided by "rocrand", but
CMake did not find one.
# 解决
sudo apt install rocrand