HPL test Compiling History

最新推荐文章于 2020-01-19 23:15:51 发布

vanderong

最新推荐文章于 2020-01-19 23:15:51 发布

阅读量838

点赞数

分类专栏： Linux

本文链接：https://blog.csdn.net/vanderong/article/details/8168343

版权

Linux 专栏收录该内容

21 篇文章 0 订阅

订阅专栏

Some problem ran into during compiling and tests.

1. Actual Performance, Theoretical Peak Performance

Formula: Peak.perf = (#float.point.unit) * (#core) * Freq
http://www.cnblogs.com/kerrycode/archive/2012/07/06/2578658.html

http://www.netlib.org/utk/people/JackDongarra/faq-linpack.html#_What_is_the_theoretical%20peak%20perfor

http://software.intel.com/en-us/articles/performance-tools-for-software-developers-hpl-application-note/
The hybrid (mpi + openmp) parallel versions of HPL binaries are also included in the package

cat /proc/cpuinfo | grep GHz | uniq

cat /proc/meminfo
model name : Intel(R) Xeon(R) CPU E5640 @ 2.67GHz (use(MHz value /1024) will be more accurate)

N^2 *8 (float size) < Mem size * 80% N could achieve more than 37000

1 nodes 12188376 kb 11.62 G

2532.970 Mhz, 2.53 Ghz 9.8945/per core

Check NVIDIA information

nvidia-smi
Tesla C2050
Total: 2687 MB
http://www.siliconmechanics.com/files/C2050Benchmarks.pdf
1030 Gigaflops (single) 515 Gigaflops (double)

Test example:

GPU max gflops / (peak cpu + gpu) ratio: 0.8/0.7

50000 768 1 2 118.69 7.021e+02 59%

36864 1024 1 1 95.26 3.506e+02 (36k)

2. USE mpirun > mpiexec > mpirun_rsh

3. Intel MPI usage

ENV

source /home/limin/intel/impi/4.0.3.008/bin64/mpivars.sh

export LD_LIBRARY_PATH=/opt/intel/composer_xe_2013.0.079/mkl/lib/intel64:$LD_LIBRARY_PATH

RUN COMMAND
/home/limin/intel/impi/4.0.3.008/intel64/bin/mpirun -n 8 -f hosts -perhost 4 -genv I_MPI_PIN_DOMAIN node ./xhpl_hybrid_intel64
export OMP_NUM_THREADS=4
/home/limin/intel/impi/4.0.3.008/intel64/bin/mpirun -n 8 -f hosts -perhost 2 -genv I_MPI_PIN_DOMAIN=omp:scatter ./xhpl_hybrid_intel64

Other commands

~/mv/mv2/bin/mpdboot -n
~/mv/mv2/bin/mpiexec -machinefile hosts -np 1 ./xhpl_hybrid_intel64

openmpi的hwloc 查看信息

HPL Parameter Explaination:

WR 1 0 L 4 L/C/R 2
depth bcast rfact NDIV pfact NBMIN

tacc's suggestion

BCAST=5 //4 or 5 may be competitive for machines featuring very fast nodes comparatively to the network

MV2_IBA_HCA=mlx4_0
MV2_VBUF_TOTAL_SIZE=140000 same as MV2_IBA_EAGER_THRESHOLD
MV2_IBA_EAGER_THRESHOLD=140000
MV2_SHOW_ENV_INFO=1
MV2_ENABLE_AFFINITY=0
MV2_USE_SHARED_MEM=0 走网卡，disable cpu-binding
MV2_SMP_USE_LIMIC2=0 p48 disable, so don't need to change

vanderong

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
HPL test Compiling History

Some problem ran into during compiling and tests.1. Actual Performance, Theoretical Peak PerformanceFormula: Peak.perf = (#float.point.unit) * (#core) * Freqhttp://www.cnblogs.com/kerrycode/arc
复制链接

扫一扫