HPL test Compiling History

Some problem ran into during compiling and tests.

1. Actual Performance, Theoretical Peak Performance

Formula:  Peak.perf = (#float.point.unit) * (#core) * Freq
http://www.cnblogs.com/kerrycode/archive/2012/07/06/2578658.html

http://www.netlib.org/utk/people/JackDongarra/faq-linpack.html#_What_is_the_theoretical%20peak%20perfor

http://software.intel.com/en-us/articles/performance-tools-for-software-developers-hpl-application-note/
The hybrid (mpi + openmp) parallel versions of HPL binaries are also included in the package


cat /proc/cpuinfo | grep GHz | uniq

cat /proc/meminfo
model name   : Intel(R) Xeon(R) CPU      E5640  @ 2.67GHz  (use(MHz value /1024) will be more accurate)

N^2 *8 (float size) < Mem size * 80%  N could achieve more than 37000

1 nodes  12188376 kb  11.62 G

2532.970 Mhz, 2.53 Ghz  9.8945/per core


Check NVIDIA information

nvidia-smi
Tesla C2050
Total: 2687 MB
http://www.siliconmechanics.com/files/C2050Benchmarks.pdf
1030 Gigaflops (single)    515 Gigaflops (double)

Test example:

GPU max gflops / (peak cpu + gpu)  ratio: 0.8/0.7

50000   768     1     2             118.69              7.021e+02 59%

36864  1024 1 1           95.26              3.506e+02  (36k)


2. USE mpirun > mpiexec > mpirun_rsh

3. Intel MPI usage

ENV

source /home/limin/intel/impi/4.0.3.008/bin64/mpivars.sh

export LD_LIBRARY_PATH=/opt/intel/composer_xe_2013.0.079/mkl/lib/intel64:$LD_LIBRARY_PATH

RUN COMMAND
/home/limin/intel/impi/4.0.3.008/intel64/bin/mpirun -n 8 -f hosts -perhost 4 -genv I_MPI_PIN_DOMAIN node ./xhpl_hybrid_intel64
export OMP_NUM_THREADS=4
/home/limin/intel/impi/4.0.3.008/intel64/bin/mpirun -n 8 -f hosts -perhost 2 -genv I_MPI_PIN_DOMAIN=omp:scatter ./xhpl_hybrid_intel64


Other commands

~/mv/mv2/bin/mpdboot -n
~/mv/mv2/bin/mpiexec -machinefile hosts -np 1 ./xhpl_hybrid_intel64

openmpi的hwloc 查看信息


HPL Parameter Explaination:

WR    1           0         L       4    L/C/R         2
       depth  bcast   rfact NDIV pfact      NBMIN

tacc's suggestion

BCAST=5   //4 or 5 may be competitive for machines featuring very fast nodes comparatively to the network

MV2_IBA_HCA=mlx4_0            
MV2_VBUF_TOTAL_SIZE=140000      same as MV2_IBA_EAGER_THRESHOLD
MV2_IBA_EAGER_THRESHOLD=140000
MV2_SHOW_ENV_INFO=1
MV2_ENABLE_AFFINITY=0
MV2_USE_SHARED_MEM=0           走网卡,disable cpu-binding
MV2_SMP_USE_LIMIC2=0               p48 disable, so don't need to change



  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值