1 BLAS参考手册
A handy reference guide to the BLAS:
- Basic Linear Algebra Subprograms: A Quick Reference Guide. http://www.netlib.org/blas/blasqr.pdf.
There are a number of implementations of the BLAS available for various architectures: - A reference implementation in Fortran is available from http://www.netlib.org/blas/.
This is an unoptimized implementation: It provides the functionality without high performance. - The current recommended high-performance open-source implementation of the BLAS is provided by the BLAS-like Library Instantiation Software (BLIS) discussed in Unit 1.5.2. The techniques you learn in this course underlie the implementation in BLIS.
2 BLAS不同厂商实现
- Different vendors provide their own high-performance implementations:
- Intel provides optimized BLAS as part of their Math Kernel Library (MKL): https://software.intel.com/en-us/mkl.
- AMD’s open-source BLAS for their CPUs can be found at https://developer.amd.com/amd-cpu-libraries/blas-library/. Their implementation is based on BLIS.
- AMD also has a BLAS library for its GPUs: https://github.com/ROCmSoftwarePlatform/rocBLAS.
- Arm provides optimized BLAS as part of their Arm Performance Library https://developer.arm.com/products/software-development-tools/hpc/arm-performance-libraries.
- IBM provides optimized BLAS as part of their Engineering and Scientific Subroutine Library (ESSL): https://www.ibm.com/support/knowledgecenter/en/SSFHY8/essl_welcome.html.
- Cray provides optimized BLAS as part of their Cray Scientific Libraries (LibSci) https://www.cray.com/sites/default/files/SB-Cray-Programming-Environment.pdf.
- For their GPU accelerators, NVIDIA provides the cuBLAS https://developer.nvidia.com/cublas.