nwchem (ROCM版)编译 -最终目标

参考网址

  1. 安装rocm下的openmpi和ucx
    https://github.com/openucx/ucx/wiki/Build-and-run-ROCM-UCX-OpenMPI
1.测试显卡是否打开了largebar功能(因为单机不需要UCX,所以也不需要这个功能了,跳过,直接到第3步)

仿照参考文档,建立测试文件check_large_bar_rocm.c后编译出错:

(base) [jrf@cu06 ~] gcc $(/opt/rocm/bin/hipconfig --cpp_config) -L/opt/rocm/lib/ -lhip_hcc check_large_bar_rocm.c -o check_large_bar_rocm
In file included from /opt/rocm/hip/include/hip/hcc_detail/channel_descriptor.h:28:0,
                 from /opt/rocm/hip/include/hip/hcc_detail/hip_texture_types.h:38,
                 from /opt/rocm/hip/include/hip/hcc_detail/hip_runtime_api.h:44,
                 from /opt/rocm/hip/include/hip/hip_runtime_api.h:342,
                 from /opt/rocm/hip/include/hip/hip_runtime.h:64,
                 from check_large_bar_rocm.c:2:
/opt/rocm/hip/include/hip/hcc_detail/hip_vector_types.h:38:24: error: missing binary operator before token "("
     #if __has_attribute(ext_vector_type)

** 尝试更新gcc**
尝试更新gcc到9.2版本,因为conda中找不到新版本的gcc,所以从源码编译。
(后来才发现系统里面有装好的8.3.0,可以用,尴尬)
参考网址这里https://blog.csdn.net/l919898756/article/details/81015617
从官网这里下载源文件,解压后,建立build文件夹,进入;

../configure  -disable-multilib --prefix=something
make
make install

出现错误

“Verify that you have permission to grant a GFDL license for all
new text in tm.texi, then copy it to $(srcdir)/doc/tm.texi.”

按照提示把文件复制过去,又出现错误

“You should edit $(srcdir)/doc/tm.texi.in rather than $(srcdir)/doc/tm.texi.”

http://www.hellogcc.org/?p=63这一篇文章提供了讲解但是没有说解决办法。
按照如下网址说的,把源码替换为新解压的文件,
https://wiki.osdev.org/Talk:GCC_Cross-Compiler
再次

make install

出现错误

g++: error: unrecognized command line option ‘-no-pie’\

改装7.2.0
查询资料得知是因为之前的gcc和g++版本太低,不知道怎么办,所以改安装7.2.0版本。https://github.com/xd009642/tarpaulin/issues/7

wget  http://ftp.tsukuba.wide.ad.jp/software/gcc/releases/gcc-7.2.0/gcc-7.2.0.tar.gz
tar -xzf gcc-7.2.0.tar.gz
cd gcc-7.2.0
mkdir build
cd build
../configure --disable-checking --enable-languages=c,c++ --disable-multilib --prefix=/home/jrf/tools/gcc-7.2.0 --enable-threads=posix
make -j24
make install

安装顺利,最后提示

Libraries have been installed in:
   /home/jrf/tools/gcc-7.2.0/lib/../lib64

If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the `-LLIBDIR'
flag during linking and do at least one of the following:
   - add LIBDIR to the `LD_LIBRARY_PATH' environment variable
     during execution
   - add LIBDIR to the `LD_RUN_PATH' environment variable
     during linking
   - use the `-Wl,-rpath -Wl,LIBDIR' linker flag
   - have your system administrator add LIBDIR to `/etc/ld.so.conf'

See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.

更新gcc成功到7.2.0,继续测试large bar
再次编译check bar的文件,编译成功,但是运行的时候出现错误

./check_large_bar_rocm: error while loading shared libraries:
	 libhip_hcc.so: cannot open shared object file: No such file or directory

那么添加动态链接的函数库

export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH

运行成功,出现段错误

(base) [jrf@cu06 ~]$ ./check_large_bar_rocm
address buf 0x7f7bb5200000
Segmentation fault (core dumped)

。。。。。。。。学长说单机不需要ucx,看教程openmpi编译不需要ROCM,所以应当不需要开启这个large future功能,那么跳过直接编译ROCM。因为之前编译检查文件使用gcc4.8.5时候,链接ROCM库出现了问题,所以这次使用系统中的8.3.0版本(在/opt/soft/中)重新编译openblas和openmpi,然后再一起编译nwchem。

2.编译UCX(跳过)
3.使用gcc7.2.0编译openblas、lapack
  1. openblas
    nwchem自带了基础的blas函数库,但是如果想要得到更快的速度,最好自己编译安装一个.
    从官网下载资源编译
#下载文件
git clone https://github.com/xianyi/OpenBLAS.git
#编译
make
#安装到指定路径
make install PREFIX=/home/jrf/tools/openblas-gcc7.2.0
  1. lapack
    其实openblas包含了lapack库,就不需要再单独安装了.在编译Nwchem的时候,把lapack和blas的目录设置成一个就行了.
    参考https://www.jianshu.com/p/fe6c4f42aa0b 传送门
4.编译openmpi
注意
  1. 单机不用ucx
  2. 1.7版本后支持cuda 安装支持cuda版本的openmpi的链接
1)步骤
  1. 下载源代码
    建议
git clone ----recursive https://github.com/open-mpi/ompi.git
  1. autogen.pl
 cd ompi
 ./autogen.pl
  1. 配置
    支持的情况下可以安装支持cuda版本的openmpi
mkdir build
cd build
../configure --prefix=/home/jrf/tools/ompi-gcc8.3.0  --with-cuda=/home/apps/jinrf/tools/cuda/cuda-11.0/include --enable-mpi-ext=cuda
  1. 编译安装
make
make instal
2)bug
  1. 缺少libtool
    ./autogen.sh 出现以下错误
    Updating build configuration files, please wait....
    
    configure.ac:38: warning: macro 'AM_PROG_LIBTOOL' not found in library
    
    configure.ac:38: error: possibly undefined macro: AM_PROG_LIBTOOL
    
          If this token and others are legitimate, please use m4_pattern_allow.
    
          See the Autoconf documentation.
    
    autoreconf: /usr/bin/autoconf failed with exit status: 1
    
    根据这里得知是缺少libtool,安装即可
    sudo apt-get install libtool #ubuntu下
    
    1. cannot find -lnuma
      make 的时候出现以下错误
    /usr/bin/ld: cannot find -lnuma
    
    解决
    安装对应文件sudo apt-get install libnuma-dev
    1. “mca_pml_ob1_recv_request_ack” two few parameter
      这是在openmpi 5.0.0中出现的问题,查看源码发现是openmpi 自己的一个函数调用了自己的另外一个函数mca_pml_ob1_recv_request_ack,调用时候参数本应有四个但是真正只写了三个.猜测可能是更高版本的gcc编译器会满足这种要求?不管了,使用原来可以正常编译通过的openmpi 4.1.0版本.
5.nwchem的安装

1.下载源代码
注意,支持rocm的是这个master分支的版本

git clone https://github.com/nwchemgit/nwchem.git

2.编辑配置文件
注意:

  1. 从nwchem7.0.0开始,如果设置了BLASOPT,也必须设置LAPACK_LIB. OPENBLAS包含了lapack,两个设置相同路径即可.
  2. 重要 这里没有设置lapack的路径LAPACK_LIB,因为感觉openblas包含了lapack的部分实现,应该可以.之前可以编译成功是因为手动编译了nwchem自带的lapack函数库libnwclapack.a,并且在BLASOPT中加入了-lnwclapack,现在已经删除了对nwclapack的搜索,因为想使用效率更快的第三方函数库.如果不行,参考在集群上编译nwchem中3.2的pccompile,使用LAPACK_LIB指定自己手动编译的第三方lapack函数库.
#!/bin/bash
export NWCHEM_TOP=/home/jrf/Quantum_Soft/nwchem-hip#这里文档一定要设为 "nwchem-6.8.1"
export NWCHEM_TARGET=LINUX64
export ARMCI_NETWORK=MPI-PR
export USE_MPI=yes
export USE_MPIF=yes
export USE_MPIF4=yes
export TCE_HIP=yes
export NWCHEM_MODULES=all
export BLASOPT="-L/home/jrf/tools/openblas-gcc8.3.0/lib -lopenblas "
export LIBRARY_PATH=/opt/rocm/hip/lib:$LIBRARY_PATH
export LIBS="-lhip_hcc"
export LD_LIBRARY_PATH=/home/jrf/tools/ompi-gcc8.3.0/lib/:$LD_LIBRARY_PATH
export PATH=$PATH:/home/jrf/tools/ompi-gcc8.3.0/bin/
export HIP_INCLUDE="-I/opt/rocm/hip/include"
export LAPACK_LIB=/home/jrf/lapack-3.9.0 
export C_INCLUDE_PATH=/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/x86_64-pc-linux-gnu:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/backward:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include:/usr/local/include:/opt/soft/gcc-8.3.0/build/include:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include-fixed:/usr/include
export OBJC_INCLUDE_PATH=/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/x86_64-pc-linux-gnu:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/backward:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include:/usr/local/include:/opt/soft/gcc-8.3.0/build/include:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include-fixed:/usr/include
export CPLUS_INCLUDE_PATH=/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/x86_64-pc-linux-gnu:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/backward:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include:/usr/local/include:/opt/soft/gcc-8.3.0/build/include:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include-fixed:/usr/include
export OBJCPLUS_INCLUDE_PATH=/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/x86_64-pc-linux-gnu:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/backward:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include:/usr/local/include:/opt/soft/gcc-8.3.0/build/include:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include-fixed:/usr/include

运行命令

make nwchem_config

遇到如下问题

config/makefile.h:220: /home/jrf/Quantum_Soft/nwchem-hip/src/config/nwchem_config.h: No such file or directory
config/makefile.h:2739: *** Please define LAPACK_LIB if you have defined BLASOPT or BLAS_LIB.  Stop.

所以编译lapack,参考https://www.jianshu.com/p/fe6c4f42aa0b
更改pccompile,添加LAPACK_LIB的路径(上面的就是最终版的)。
再次运行上面命令,通过,继续编译
运行命令

make 

遇到错误,在文件夹/src/tce/ccsd_t中运行的命令

hipcc -c -DTCE_HIP -fno-gpu-rdc -o memory.o memory.hip.cpp
Warning: Type mismatch in argument ‘deltat’ at (1); passed REAL(8) to INTEGER(8) [-Wargument-mismatch]
Compiling ccsd_t_kernels_omp.F...
Compiling tce_hashnsort.F...
Compiling ccsd_t_pstat.F...
Compiling ccsd_t_dot.F...
Compiling ccsd_t_neword.F...
Compiling hybrid.c...
hipcc -c -DTCE_HIP -fno-gpu-rdc -o memory.o memory.hip.cpp
In file included from memory.hip.cpp:1:
In file included from ./header.h:5:
In file included from /opt/rocm/hip/include/hip/hip_runtime_api.h:342:
In file included from /opt/rocm/hip/include/hip/hcc_detail/hip_runtime_api.h:44:
In file included from /opt/rocm/hip/include/hip/hcc_detail/hip_texture_types.h:38:
In file included from /opt/rocm/hip/include/hip/hcc_detail/channel_descriptor.h:28:
/opt/rocm/hip/include/hip/hcc_detail/hip_vector_types.h:45:14: fatal error: 'array' file not found
    #include <array>
             ^~~~~~~
1 error generated.
In file included from memory.hip.cpp:1:
In file included from ./header.h:5:
In file included from /opt/rocm/hip/include/hip/hip_runtime_api.h:342:
In file included from /opt/rocm/hip/include/hip/hcc_detail/hip_runtime_api.h:44:
In file included from /opt/rocm/hip/include/hip/hcc_detail/hip_texture_types.h:38:
In file included from /opt/rocm/hip/include/hip/hcc_detail/channel_descriptor.h:28:
/opt/rocm/hip/include/hip/hcc_detail/hip_vector_types.h:45:14: fatal error: 'array' file not found
    #include <array>
             ^~~~~~~
1 error generated.
make[3]: *** [/home/jrf/Quantum_Soft/nwchem-hip/lib/LINUX64/libtce.a(memory.o)] Error 1
make[3]: Leaving directory `/home/jrf/Quantum_Soft/nwchem-hip/src/tce/ccsd_t'
make[2]: *** [optimized] Error 2
make[2]: Leaving directory `/home/jrf/Quantum_Soft/nwchem-hip/src/tce/ccsd_t'
make[1]: *** [subdirs] Error 1
make[1]: Leaving directory `/home/jrf/Quantum_Soft/nwchem-hip/src/tce'
make: *** [libraries] Error 1

这个问题应当是hipcc搜索头文件路径设置的不正确导致的,AMD将Clang+LLVM进行扩展形成HIP的底层编译器,以支持AMD GPU编译。实际上在ROCm环境,HIP有三种平台模式(通过环境变量HIP_PLATFORM区别):clang、hcc和nvcc。而HIP提供的hipcc命令,实质是一个perl脚本,通过HIP_PLATFORM等环境变量,调用不同的底层编译器,实现统一编译模式。所以应当是因为clang的路径搜索有问题,使用如下命令检查clang是否能够正常编译

clang -c -DTCE_HIP -fno-gpu-rdc -o memory.o memory.hip.cpp

发现同样的问题,使用如下命令来看看gcc和clang的include路径分别是什么

gcc -v -x c++ /dev/null -fsyntax-only
clang -v -x c++ /dev/null -fsyntax-only

发现果然不一样根据官网,可以知道通过四个环境变量设置路径:
(C_INCLUDE_PATH,
OBJC_INCLUDE_PATH,
CPLUS_INCLUDE_PATH,
OBJCPLUS_INCLUDE_PATH)
使用-I指定头文件搜索路径,再次编译memory.hip.cpp文件,通过,但是出现了新的问题

memory.hip.cpp:132:18: warning: 'hipMallocHost' is deprecated: use hipHostMalloc instead [-Wdeprecated-declarations]
  ptr = morecore(hipMallocHost, bytes);
                 ^
/opt/rocm/hip/include/hip/hcc_detail/hip_runtime_api.h:1115:1: note: 'hipMallocHost' has been explicitly marked deprecated here
DEPRECATED("use hipHostMalloc instead")
^
/opt/rocm/hip/include/hip/hcc_detail/hip_runtime_api.h:55:41: note: expanded from macro 'DEPRECATED'
#define DEPRECATED(msg) __attribute__ ((deprecated(msg)))
                                        ^
1 warning generated.
memory.hip.cpp:132:18: warning: 'hipMallocHost' is deprecated: use hipHostMalloc instead [-Wdeprecated-declarations]
  ptr = morecore(hipMallocHost, bytes);
                 ^
/opt/rocm/hip/include/hip/hcc_detail/hip_runtime_api.h:1115:1: note: 'hipMallocHost' has been explicitly marked deprecated here
DEPRECATED("use hipHostMalloc instead")
^
/opt/rocm/hip/include/hip/hcc_detail/hip_runtime_api.h:55:41: note: expanded from macro 'DEPRECATED'
#define DEPRECATED(msg) __attribute__ ((deprecated(msg)))
                                        ^
1 warning generated.

但是这不是我所能更改的,没有权限,因为是warning,那就放过吧。
将如下命令添加进pccompile文件中去,重复上面的编译过程。

export C_INCLUDE_PATH=/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/x86_64-pc-linux-gnu:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/backward:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include:/usr/local/include:/opt/soft/gcc-8.3.0/build/include:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include-fixed:/usr/include
export OBJC_INCLUDE_PATH=/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/x86_64-pc-linux-gnu:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/backward:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include:/usr/local/include:/opt/soft/gcc-8.3.0/build/include:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include-fixed:/usr/include
export CPLUS_INCLUDE_PATH=/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/x86_64-pc-linux-gnu:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/backward:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include:/usr/local/include:/opt/soft/gcc-8.3.0/build/include:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include-fixed:/usr/include
export OBJCPLUS_INCLUDE_PATH=/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/x86_64-pc-linux-gnu:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/backward:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include:/usr/local/include:/opt/soft/gcc-8.3.0/build/include:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include-fixed:/usr/include

再次遇到问题 找不到 -lnwclapack

make[1]: 进入目录“/home/jrf/Quantum_Soft/nwchem-hip/src”
gfortran -m64 -ffast-math  -Warray-bounds -std=legacy -fdefault-integer-8 -fno-tree-dominator-opts  -finline-functions -O2 -g -fno-aggressive-loop-optimizations -fno-tree-dominator-opts  -g -O   -I.  -I/home/jrf/Quantum_Soft/nwchem-hip/src/include -I/home/jrf/Quantum_Soft/nwchem-hip/src/tools/install/include -DGFORTRAN -DCHKUNDFLW -DGCC4 -DGCC46 -DEXT_INT -DLINUX -DLINUX64 -DPARALLEL_DIAG -DTCE_HIP  -D__HIP_PLATFORM_HCC__=   -I/opt/rocm/hip/include -I/opt/rocm/hcc/include -I/opt/rocm/hsa/include -DCOMPILATION_DATE="'`date +%a_%b_%d_%H:%M:%S_%Y`'" -DCOMPILATION_DIR="'/home/jrf/Quantum_Soft/nwchem-hip'" -DNWCHEM_BRANCH="'7.0.0'"  -c -o nwchem.o nwchem.F
gfortran -m64 -ffast-math  -Warray-bounds -std=legacy -fdefault-integer-8 -fno-tree-dominator-opts  -finline-functions -O2 -g -fno-aggressive-loop-optimizations -fno-tree-dominator-opts  -g -O   -I.  -I/home/jrf/Quantum_Soft/nwchem-hip/src/include -I/home/jrf/Quantum_Soft/nwchem-hip/src/tools/install/include -DGFORTRAN -DCHKUNDFLW -DGCC4 -DGCC46 -DEXT_INT -DLINUX -DLINUX64 -DPARALLEL_DIAG -DTCE_HIP  -D__HIP_PLATFORM_HCC__=   -I/opt/rocm/hip/include -I/opt/rocm/hcc/include -I/opt/rocm/hsa/include -DCOMPILATION_DATE="'`date +%a_%b_%d_%H:%M:%S_%Y`'" -DCOMPILATION_DIR="'/home/jrf/Quantum_Soft/nwchem-hip'" -DNWCHEM_BRANCH="'7.0.0'"  -c -o stubs.o stubs.F
make[1]: 离开目录“/home/jrf/Quantum_Soft/nwchem-hip/src”
gfortran   -L/home/jrf/Quantum_Soft/nwchem-hip/lib/LINUX64 -L/home/jrf/Quantum_Soft/nwchem-hip/src/tools/install/lib   -o /home/jrf/Quantum_Soft/nwchem-hip/bin/LINUX64/nwchem nwchem.o stubs.o -lnwctask -lccsd -lmcscf -lselci -lmp2 -lmoints -lstepper -ldriver -loptim -lnwdft -lgradients -lcphf -lesp -lddscf -ldangchang -lguess -lhessian -lvib -lnwcutil -lrimp2 -lproperty -lsolvation -lnwints -lprepar -lnwmd -lnwpw -lofpw -lpaw -lpspw -lband -lnwpwlib -lcafe -lspace -lanalyze -lqhop -lpfft -ldplot -ldrdy -lvscf -lqmmm -lqmd -letrans -lpspw -ltce -lbq -lmm -lcons -lperfm -ldntmc -lccca -ldimqm -lnwcutil -lga -larmci -lpeigs -lperfm -lcons -lbq -lnwcutil    -L/home/jrf/tools/openblas-gcc8.3.0/lib -lnwclapack -lopenblas   /home/jrf/lapack-3.9.0 -lnwcblas   -L/home/jrf/tools/ompi/lib -lmpi_usempi -lmpi_mpifh -lmpi     -lcomex -lmpi_usempi -lmpi_mpifh -lmpi -lrt -lpthread -lm -lpthread  -lstdc++  
/usr/bin/ld: 找不到 -lnwclapack
/home/jrf/lapack-3.9.0: 文件无法辨识: 是一个目录
collect2: 错误:ld 返回 1
make: *** [all] 错误 1

从命名上看,这应当是nwmchem自己编译生成的类似lapack的库,看了之前编译的cpu的版本,发现在lib中是有这个库的,就在~/Quantum_Soft/nwchem_6.8.1/lib中,于是去了~/Quantum_Soft/nwchem_hip/src/lapack中尝试make,出现了如下提示

NWChem's Performance is degraded by not setting BLASOPT
Please consider using ATLAS, GotoBLAS2, OpenBLAS, Intel MKL,
IBM ESSL, AMD ACML, etc. to improve performance.
If you decide to not use a fast implementation of BLAS/LAPACK,
please define USE_INTERNALBLAS=y and the internal Netlib will be used.
/home/jrf/Quantum_Soft/nwchem-hip/bin/LINUX64/depend.x  -I/home/jrf/Quantum_Soft/nwchem-hip/src/tools/install/include > dependencies

NWChem's Performance is degraded by not setting BLASOPT
Please consider using ATLAS, GotoBLAS2, OpenBLAS, Intel MKL,
IBM ESSL, AMD ACML, etc. to improve performance.
If you decide to not use a fast implementation of BLAS/LAPACK,
please define USE_INTERNALBLAS=y and the internal Netlib will be used.
make: Nothing to be done for `errordgemm'.

哦?这很奇怪,我明明是设置了这个变量的呀,再次运行pccompile文件,make一下,发现生成了libnwclapack.a,唉?之前怎么就没有生成呢。

几个月后回顾,为什么会出现这个问题嘞?
-lnwclapack 应当是寻找nwchem自己内含的一个lapack函数库,是一个最基本的实现,速度和效率会比较低.这就是为什么进入/src/lapack进行make的时候会出现这样的警告了:
因为:
我在pccompile中设置了第三方blas库的路径,并且没有设置USE_INTERNALBLAS为y,因此编译nwchem时,make认为我是想用外部的lapack,因此就没有进入src/lapack来编译自带的lapack库.但是我又矛盾地使用-l命令去寻找nwchem自带的lapack库-lnwclapack,自然是找不到.
解决办法:
1)安装lapack函数库,设置lapack函数库的路径LAPACK_LIB,修改-lnwclapack-llapack(已实践,可以)
2)安装的openblas函数库有部分lapack的实现,可以直接将-llapack删除试一下(不知道行不行)
不管了,再次make,大段的未定义地函数,都是针对hip的API的。

lib/LINUX64/libtce.a(sd_t_total.o): In function `hip_impl::kernarg hip_impl::make_kernarg<15ul, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, double*, double*, double*, int, int, (void*)0>(std::tuple<int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, double*, double*, double*, int, int> const&, hip_impl::kernargs_size_align const&, hip_impl::kernarg)':
(.text+0x3ebb3): undefined reference to `hip_impl::kernarg::resize(unsigned long)'
(.text+0x3f41b): undefined reference to `hip_impl::kernarg::~kernarg()'
/home/jrf/Quantum_Soft/nwchem-hip/lib/LINUX64/libtce.a(sd_t_total.o): In function `hip_impl::kernarg hip_impl::make_kernarg<20ul, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, double*, double*, double*, int, int, (void*)0>(std::tuple<int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, double*, double*, double*, int, int> const&, hip_impl::kernargs_size_align const&, hip_impl::kernarg)':
(.text+0x3f4c2): undefined reference to `hip_impl::kernarg::kernarg(hip_impl::kernarg&&)'
/home/jrf/Quantum_Soft/nwchem-hip/lib/LINUX64/libtce.a(memory.o): In function `getGpuMem':
(.text+0x274): undefined reference to `hipMalloc'
/home/jrf/Quantum_Soft/nwchem-hip/lib/LINUX64/libtce.a(memory.o): In function `morecore(hipError_t (*)(void**, unsigned long), unsigned long)':
memory.hip.cpp:(.text+0x579): undefined reference to `hipGetLastError'
memory.hip.cpp:(.text+0x580): undefined reference to `hipGetErrorString'
/home/jrf/Quantum_Soft/nwchem-hip/lib/LINUX64/libtce.a(memory.o): In function `getHostMem':
(.text+0x8f4): undefined reference to `hipMallocHost'
/home/jrf/Quantum_Soft/nwchem-hip/lib/LINUX64/libtce.a(memory.o): In function `clearGpuFreeList()':
memory.hip.cpp:(.text+0xf35): undefined reference to `hipFree'
/home/jrf/Quantum_Soft/nwchem-hip/lib/LINUX64/libtce.a(memory.o): In function `clearHostFreeList()':
memory.hip.cpp:(.text+0x1015): undefined reference to `hipHostFree'
/home/jrf/Quantum_Soft/nwchem-hip/lib/LINUX64/libtce.a(hybrid.o): In function `device_init_':
hybrid.c:(.text+0x48): undefined reference to `hipGetDeviceCount'
hybrid.c:(.text+0x8f): undefined reference to `hipSetDevice'
collect2: error: ld returned 1 exit status

针对的命令是下面的,这是最后对可执行程序nwchem的链接

gfortran   -L/home/jrf/Quantum_Soft/nwchem-hip/lib/LINUX64 -L/home/jrf/Quantum_Soft/nwchem-hip/src/tools/install/lib   -o /home/jrf/Quantum_Soft/nwchem-hip/bin/LINUX64/nwchem nwchem.o stubs.o -lnwctask -lccsd -lmcscf -lselci -lmp2 -lmoints -lstepper -ldriver -loptim -lnwdft -lgradients -lcphf -lesp -lddscf -ldangchang -lguess -lhessian -lvib -lnwcutil -lrimp2 -lproperty -lsolvation -lnwints -lprepar -lnwmd -lnwpw -lofpw -lpaw -lpspw -lband -lnwpwlib -lcafe -lspace -lanalyze -lqhop -lpfft -ldplot -ldrdy -lvscf -lqmmm -lqmd -letrans -lpspw -ltce -lbq -lmm -lcons -lperfm -ldntmc -lccca -ldimqm -lnwcutil -lga -larmci -lpeigs -lperfm -lcons -lbq -lnwcutil    -L/home/jrf/tools/openblas-gcc8.3.0/lib -lnwclapack -lopenblas   -L/home/jrf/lapack-3.9.0 -lnwcblas   -L/home/jrf/tools/ompi/lib -lmpi_usempi -lmpi_mpifh -lmpi     -lcomex -lmpi_usempi -lmpi_mpifh -lmpi -lrt -lpthread -lm -lpthread  -lstdc++

warning: ‘hipMallocHost’ is deprecated: use hipHostMalloc instead [-Wdeprecated-declarations]

一个问题相同但是原因不相同的帖子,他是因为装了两个版本的hip?但是这个帖子讲了怎么使用nm。


MIGraphX fails to link for Vega10 on ROCm 2.3


使用nm命令来分析静态库函数libtce.a,发现其中报错的一个函数hip_impl是weak格式

(base) [jrf@cu06 LINUX64]$ nm -A libtce.a | c++filt | grep hip_impl::hipLaunchKernelGGLImpl
libtce.a:sd_t_total.o:00000000000144a0 W hip_impl::hipLaunchKernelGGLImpl(unsigned long, dim3 const&, dim3 const&, unsigned int, ihipStream_t*, void**)

然后通过nm命令发现在/opt/rocm/hip/lib的libhip_hcc.so库函数里面包含这个函数的实现,应当添加上这个库的搜索路径,在pccompile中添加

export LIBRARY_PATH=/opt/rocm/hip/lib:$LIBRARY_PATH
export LIBS="-lhip_hcc"

又出现了如下的错误,不过通过最后这个整合的编译命令上面看,上面的两个环境变量的设置并没有起作用。而且不知道为什么,libmpi_usempi.so在用gcc8.3.0编译的版本中是没有的,只有在4.8.5那个版本中有,但是去掉这个函数库对编译没有影响。

make[1]: Leaving directory `/home/jrf/Quantum_Soft/nwchem-hip/src'
gfortran   -L/home/jrf/Quantum_Soft/nwchem-hip/lib/LINUX64              \
-L/home/jrf/Quantum_Soft/nwchem-hip/src/tools/install/lib               \
 -o /home/jrf/Quantum_Soft/nwchem-hip/bin/LINUX64/nwchem nwchem.o stubs.o \
 -lnwctask -lccsd -lmcscf -lselci -lmp2 -lmoints -lstepper -ldriver -loptim \
 -lnwdft -lgradients -lcphf -lesp -lddscf -ldangchang -lguess -lhessian  \
  -lvib -lnwcutil -lrimp2 -lproperty -lsolvation -lnwints -lprepar -lnwmd   \
  -lnwpw -lofpw -lpaw -lpspw -lband -lnwpwlib -lcafe -lspace -lanalyze   \
  -lqhop -lpfft -ldplot -ldrdy -lvscf -lqmmm -lqmd -letrans -lpspw -ltce    \
  -lbq -lmm -lcons -lperfm -ldntmc -lccca -ldimqm -lnwcutil -lga -larmci    \
  -lpeigs -lperfm -lcons -lbq -lnwcutil    \ 
  -L/home/jrf/tools/openblas-gcc8.3.0/lib -lnwclapack -lopenblas   \
  -L/home/jrf/lapack-3.9.0 -lnwcblas   -L/home/jrf/tools/ompi-gcc8.3.0/lib  \
  -lmpi_usempif08  -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi     \
  -lcomex -lmpi_usempi -lmpi_mpifh   \
  -lmpi -lrt -lpthread -lm -lpthread  -lstdc++
/usr/bin/ld: cannot find -lnwclapack
/usr/bin/ld: cannot find -lmpi_usempi
collect2: error: ld returned 1 exit status
make: *** [all] Error 1

尝试手动链接了最后的可执行程序,试试能不能运行成功。
。。。
手动编译运行成功了。
运行结果发现比纯使用cpu快了十几分钟。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值