BLAS / GEMM/POTRF

本文深入探讨了高性能计算社区对通用矩阵乘法(GEMM)的重视,因为大多数Level 3基础线性代数子程序(BLAS)都可以用GEMM表示,并且线性代数求解器的性能依赖于GEMM。针对Intel的三种处理器架构,包括新的Intel MIC架构,研究了如何通过优化GEMM来实现高性能。此外,还研究了OpenMP、Pthreads、Cilk和TBB四种共享内存并行语言对GEMM、TRSM、SYRK和Cholesky(POTRF)等例行程序的影响,揭示了哪种语言更适合编写此类程序以及哪些架构特性对性能影响最大。
摘要由CSDN通过智能技术生成

The high performance computing (HPC) community is obsessed over the general matrix-matrix multiply (GEMM) routine. This obsession is not without reason. Most, if not all, Level 3 Basic Linear Algebra Subroutines (BLAS) can be written in terms of GEMM, and many of the higher level linear algebra solvers’ (i.e., LU, Cholesky) performance depend on GEMM’s performance. Getting high performance on GEMM is highly architecture dependent, and so for each new architecture that comes out, GEMM has to be programmed and tested to achieve maximal performance. Also, with emergent computer architectures featuring more vector-based and multi to many-core processors, GEMM performance becomes hinged to the utilization of these technologies. In this research, three Intel processor architectures are explored, including the new Intel MIC Architecture. Each architecture has different vector lengths and number of cores. The effort given to create three Level 3 BLAS routines (GEMM, TRSM, SYRK) is examined

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值