Algorithm:
http://www.norstad.org/matrix-multiply/index.html
A classic summarization of a mapreduce algorithm for matrix multiplication, including four blocking strategies.
anatomy of high-performance matrix multiplication
The paper for GotoBLAS, analyzes the different blocking strategies on the hierachical memory.
OpenBLAS is now the latest version based on GotoBLAS under maintanance.
Cost:
Upper and Lower Bounds on the Cost of a Map-Reduce Computation
This paper models the tradeoff between parallism and communication -- generally, better parallism leads to more replication for the inputs and more consequent communication. There are three examples including matrix multiplication in the paper.
http://www.gordon-taft.net/MatrixMultiplication.html
It summarizes the types of cache misses and the main cache priciples for matrix multiplication.
转载于:https://blog.51cto.com/daisy8867/1208809