CUDA系列学习（四）Parallel Task类型与 Memory Allocation

最新推荐文章于 2024-07-31 19:53:05 发布

Rachel-Zhang

最新推荐文章于 2024-07-31 19:53:05 发布

阅读量1.1w

点赞数 10

分类专栏： C/C Data Structure Computer System 文章标签： GPU CUDA 并行计算

本文链接：https://blog.csdn.net/abcjennifer/article/details/43374009

版权

本文深入探讨CUDA编程，涉及并行通信模式（Map, Gather, Scatter, Stencil, Transpose）、内存模型、控制流和同步，以及原子操作。重点讲解如何提高GPU计算效率，例如通过最大化算术强度、最小化内存访问和同步线程。" 113663103,10504541,SQL学习：视图VIEW与存储过程PROCEDURE详解,"['SQL', '数据库理论', '程序设计方法']

摘要由CSDN通过智能技术生成

本文为CUDA系列学习第四讲，首先介绍了Parallel communication patterns的几种形式（map, gather, scatter, stencil, transpose）, 然后复习了cuda memory model并从high level上分析怎样写出高效代码，最后学习了流程控制（control flow）以及其中一个重要部分——原子操作。参考资料：udacity cs344.

(一). Parallel communication Patterns

在上一章CUDA系列学习（二）CUDA memory & variables中我们介绍了memory和variable的不同类型，本章中根据不同的memory映射方式，我们将task分为以下几种类型：Map, Gather, Scatter, Stencil, transpose.

1.1 Map, Gather, Scatter

Map: one input - one output
Gather: several input - one output
e.g image blur by average
Scatter: one input - several output
e.g add a value to its neighbors
(因为每个thread 将结果scatter到各个memory，所以叫scatter)

图为Map, Gather & Scatter示意图:

这里写图片描述

1.2 Stencil, Transpose

stencil: 对input中的每一个位置，
stencil input：该点的neighborhood
stencil output：该点value
e.g image blur by average
这样也可以看出，stencil和gather很像，其实stencil是gather的一种，只不过stencil要求input必须是neighborhood而且对input的每一个元素都要操作
图示：
1. 2D stencil: (示例为两种形式)
2. 3D stencil:
transpose
input：matrix M
output: M^T
图示：
1. Matrix transpose
2. Transpose represents in vector