卷积操作的GPU粗粒度并行实现及测试(优化)
A.边界扩展;
B.字块对齐。
Matrix Size |
Number |
Kernel |
CPU(s) |
CPU2GPU |
GPU-Kernel |
GPU2CPU |
5x4 |
1 |
5x4 |
<1ms |
<1ms |
<1ms |
<1ms |
12x9 |
1 |
5x4 |
<1ms |
<1ms |
<1ms |
<1ms |
1 |
5x4 |
<1ms |
<1ms |
<1ms |
<1ms |
|
118x29 |
1 |
5x4 |
<1ms |
<1ms |
<1ms |
<1ms |
138x59 |
1 |
5x4 |
<1ms |
<1ms |
<1ms |
<1ms |
158x159 |
1 |
5x4 |
0.005 |
<1ms |
<1ms |
<1ms |
558x559 |
1 |
5x4 |
0.041 |
<1ms |
0.001 |
<1ms |
1128x1159 |
1 |
5x4 |
0.156 |
0.002 |
0.003 |
0.002 |
2128x2159 |
1 |
5x4 |