The high performance computing (HPC) community is obsessed over the general matrix-matrix multiply (GEMM) routine. This obsession is not without reason. Most, if not all, Level 3 Basic Linear Algebra Subroutines (BLAS) can be written in terms of GEMM, and many of the higher level linear algebra solvers’ (i.e., LU, Cholesky) performance depend on GEMM’s performance. Getting high performance on GEMM is highly architecture dependent, and so for each new architecture that comes out, GEMM has to be programmed and tested to achieve maximal performance. Also, with emergent computer architectures featuring more vector-based and multi to many-core processors, GEMM performance becomes hinged to the utilization of these technologies. In this research, three Intel processor architectures are explored, including the new Intel MIC Architecture. Each architecture has different vector lengths and number of cores. The effort given to create three Level 3 BLAS routines (GEMM, TRSM, SYRK) is examined
BLAS / GEMM/POTRF
最新推荐文章于 2023-12-04 10:26:33 发布
本文深入探讨了高性能计算社区对通用矩阵乘法(GEMM)的重视,因为大多数Level 3基础线性代数子程序(BLAS)都可以用GEMM表示,并且线性代数求解器的性能依赖于GEMM。针对Intel的三种处理器架构,包括新的Intel MIC架构,研究了如何通过优化GEMM来实现高性能。此外,还研究了OpenMP、Pthreads、Cilk和TBB四种共享内存并行语言对GEMM、TRSM、SYRK和Cholesky(POTRF)等例行程序的影响,揭示了哪种语言更适合编写此类程序以及哪些架构特性对性能影响最大。
摘要由CSDN通过智能技术生成