C++ AMP分组优化原理,以矩阵乘法为例
这是我们普通的AMP矩阵乘法:
#include<amp.h>
Concurrency::array_view<const float, 2> a(M, W, vA);
Concurrency::array_view<const float, 2> b(W, N, vB);
Concurrency::array_view<float, 2> c(M, N, vC);
c.discard_data();
Concurrency::parallel_for_each(
c.extent,
[=](Concurrency::index<2> idx) restrict(amp) {
int row = idx[0]