-
2222222论文:
- GEMMFIP: Unifying GEMM in BLIS 2302.08417.pdf (arxiv.org)
- BLISlab: A Sandbox for Optimizing GEMM 1609.00076.pdf (arxiv.org)
- LAFF-On Programming for High Performance: ulaff.net
- Anatomy of High-Performance Matrix Multiplication gotoPaper.pdf (utexas.edu)
- Anatomy of High-Performance Many-Threaded Matrix Multiplication; blis3_ipdps14.pdf (utexas.edu)
- PfHP Blocking for the L1, L2, and L3 caches
7. Publications Related to the FLAME Project (utexas.edu)
书、博客:
- 两分钟速览矩阵乘法库openblas核心: gemm
- OpenBLAS gemm 从零入门
- GEMM caching
- Blocking-for-L1-L3
- OPENBLAS矩阵乘法源码结构分析
- BLISlab tutoril阅读
- 多线程矩阵乘法优化
- 斯坦福CS217(三)GEMM计算加速