摘要:
A fast and accurate method for testing the float-point performance on parallel systems has been proposed by using the HPL benchmark on a cluster system connected by Myrinet network. It is found that HPL shows very good scalability for different BLAS implementations on this system. The factors which mostly affect the result of performance test are: BLAS, array of processors, block size of LU factorization and size of linear system etc. We also found that higher performance can be achieved by using shared memory entirely for communication on each node.
展开