在High Performance Computing (HPC)领域,最有影响的矩阵库-GotoBLAS,在长久的等待后终于有了更新,而且是直接从1.26跳到了GotoBLAS2,似乎Goto重写了,目前可在官网上下载的最新版本是GotoBLAS2-1.13_bsd。
今天下载了,在mingw下近20分钟的编译过程后成功生成了libgoto2_penrynp-r1.13.lib库文件。
GotoBLAS2 has been released by the Texas Advanced Computing Center as open source software under the BSD license. This product is no longer under active development by TACC, but it is being made available to the community to use, study, and extend. GotoBLAS2 uses new algorithms and memory techniques for optimal performance of the BLAS routines. The changes in this final version target new architecture features in microprocessors and interprocessor communication techniques; also, NUMA controls enhance multi-threaded execution of BLAS routines on node. The library features optimal performance on the following platforms:
Intel Nehalem and Atom systems
VIA Nanoprocessor
AMD Shanghai and Istanbul
The library includes the following features:
- Configurations for a variety of hardware platforms
- Incorporation of features of many ISAs (Instruction Set Architecture)
- Implementation of NUMA controls to assure best process affinity and memory policy
- Dynamic detection of multiple architecture components, which can be included in a single binary (for binary distributions)
有机会研究一下源码。
What are the GotoBLAS?