1、使用libffi启动执行,ffi全称Foreign Function Interface,参考https://www.cnblogs.com/findumars/p/4882620.html的介绍,
2、在Clang前端有很多openMP相关的处理,不仅限于openmp子文件夹,比如clang/lib/AST/ExprConstant.cpp对buildin omp_is_initial_device 的展开处理
3、OpenMP运行时,调用clang-offload-wrapper工具将多个设备二进制文件打包在host二进制文件里。同时clang-offload-wrapper也将必要的.omp_offload函数加入到Ctors中,以便main前完成运行相关的初始化工作
4、通过__tgt_rtl_run_target_team_region中对参数类型ffi_type_pointer及返回值类型ffi_type_void的分析,规范了设备端函数的形式
即所有的参数为引用形式,返回值为void类型
另外,设备侧和主机侧变量地址是不相同的(形参)
5、在Init --> LoadRTLS->RegisterLib->__tgt_target_mapper->UnregisterLib->Deint中,RegisterLib事实上是通过__tgt_target引入,而__tgt_target本身对应#pragma omp target map(from: isHost)
6、 使用ompi_info | grep btl查询本地的openib支持情况,使用--mca btl_base_verbose 100查看更详细的错误,参考https://github.com/open-mpi/ompi/issues/5280
注意:构建openmpi时需要--with-ucx=$ucx_install_dir才能支持ucx特性
pushd $DEP_DIR/ucx-1.5.2
CFLAGS=$optflags CXXFLAGS=$optflags LDFLAGS=$ldflags \
./contrib/configure-release --prefix=$ucx_install_dir \
--with-knem=$knem_install_dir --enable-optimizationsif [ $COMPILE_TYPE == "gcc" ];then
sed -i 's/-Werror//g' src/uct/Makefile
else
sed -i 's/-Werror/-Wno-error/' `grep -lr '\-Werror' ./*`
fi
7、mpi构建中提示对该库的使用需要做的一些配置
/home/zhongyunde/hpc-workload/TOPN2023/dependence_llvm/openmpi-4.0.4/lib
If you ever happen to want to link against installed librariesin a given directory, LIBDIR, you must either use libtool, andspecify the full pathname of the library, or use the '-LLIBDIR'flag during linking and do at least one of the following:- add LIBDIR to the 'LD_LIBRARY_PATH' environment variable during execution
- add LIBDIR to the 'LD_RUN_PATH' environment variable during linking
- use the '-Wl,-rpath -Wl,LIBDIR' linker flag
- have your system administrator add LIBDIR to '/etc/ld.so.conf'
8、Lammp运行时使能omp模式参考https://matsci.org/t/build-how-to-enable-the-omp-mode-with-new-23jun2022-version/48234/1
9、选项--use-hwthread-cpus能自动的检测硬件线程数,因此-np 128可以去掉
--cpu-set 0-127 --bind-to cpulist:ordered (有些硬件核数不是128,才会有区别)
10、不清楚-fopenmp和-fopenmp=libomp的区别,发现用-fopenmp=libomp选项能规避运行时问题
11、llvm 编译的OpenMP程序perf采热点需要显示的指定export OMP_WAIT_POLICY=PASSIVE能避免热点在 wait 里;另外,可以使用strace命令查看System time耗时(参考D58148)
参考LLVM/OpenMP Runtimes — LLVM/OpenMP 18.0.0git documentation
strace -ff -ttt ./interp2d_mp -t bicubic -f interp2d_eval
KMP_USE_YIELD=0
12、 openmp对于#pragma omp作用域之外的变量默认share, 可以通过private进行调整
13、OpenMP offload 的代码翻译成 OpenMP RTL The LLVM Compiler Infrastructure Project