项目组HPL_GPU单机的安装和运行

HPL_GPU地址

https://github.com/reger-men/HPL_GPU.git

环境

  • rocm 版本: 5.3.22061-e8e78f1a
  • 使用 rocm/rocm-terminal 的镜像, 创建容器(版本为最新版)
  • UCX
  • openMPI
  • OSU
  • openBLAS
  • rocBLAS
  • rocrand

环境安装

注意: 所有库的安装路径选择/usr/local. 否则使用 sudo 时, 可能找不到软件

  1. 安装 UCX, openMPI, OSU
# 安装 UCX
git clone https://github.com/openucx/ucx.git
cd ucx
# git checkout v1.10.x # optional to use v1.10.x branch
sudo ./autogen.sh
mkdir build
cd build
sudo ../contrib/configure-opt --prefix=/usr/local --with-rocm=/opt/rocm --without-knem --without-cuda --enable-gtest --enable-examples 
make
make install

# 安装openMPI
git clone --recursive -b v4.1.x https://github.com/open-mpi/ompi.git
cd ompi
./autogen.pl
mkdir build
cd build
../configure --prefix=/usr/local --with-ucx=/usr/local --without-verbs
make
make install

# 安装 OSU
wget http://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-5.7.tar.gz
tar xvf osu-micro-benchmarks-5.7.tar.gz
mv osu-micro-benchmarks-5.7 osu
cd osu
./configure --enable-rocm --with-rocm=/opt/rocm CC=/usr/local/bin/mpicc CXX=/usr/local/bin/mpicxx LDFLAGS="-L/usr/local/lib/ -lmpi -L/opt/rocm/lib/ $(hipconfig -C) -lamdhip64" CPPFLAGS="-std=c++11"

也可根据 https://github.com/openucx/ucx/wiki/Build-and-run-ROCM-UCX-OpenMPI 安装

  1. 安装 openBLAS
sudo git clone https://github.com/xianyi/OpenBLAS.git
cd OpenBLAS
sudo make TARGET=NEHALEM
sudo make PREFIX=/usr/local install 

也可根据 https://blog.csdn.net/weixin_41477306/article/details/99727496 安装

  1. 安装 rocBLAS, rocrand
	sudo apt-get update
	sudo apt-get install rocblas
	sudo apt-get install rocrand

开始编译原文件

git clone https://github.com/reger-men/HPL_GPU.git
cd HPL-GPU
mkdir build && cd build
sudo cmake .. -DMPI_DIR=/usr/local -DBLAS_DIR=/usr/local
sudo make -j

修改配置文件

  • 修改HPL.dat的配置: N任意(最好矩阵空间为内存的80%), P和Q均为1(单GPU), NB 最好为32的倍数. 其他不用调整
  • 修改 mpirun_xhplhip.sh 文件
    • 修改openMPI的地址

执行HPL_GPU

sudo ./mpirun_xhplhip.sh

可能的问题

  1. openMPI错误
mpirun: error while loading shared libraries: libopen-rte.so.40: cannot open shared object file: No such file or directory
# 解决
sudo ldconfig
  1. 安装软件问题
The value of the MCA parameter "plm_rsh_agent" was set to a path
that could not be found:

plm_rsh_agent: ssh : rsh

Please either unset the parameter, or check that the path is correct

# 解决
sudo apt install openssh-server
By not providing "Findrocrand.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "rocrand", but
  CMake did not find one.

# 解决
sudo apt install rocrand
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值