关于CUDA中half-warp的简述

最新推荐文章于 2024-07-18 11:20:52 发布

Glow_raw

最新推荐文章于 2024-07-18 11:20:52 发布

阅读量127

点赞数 5

文章标签： c++ linux 人工智能

本文链接：https://blog.csdn.net/Vingnir/article/details/140470566

版权

接触bank conflict时会引入 half-warp 这个概念，不过新的硬件架构中已经弱化这个概念了，并行单元都是 full-warp。具体的概述这里引用一个网站，个人认为适合新手来理解：

What's the mechanism of the warps and the banks in CUDA? - Stack Overflow

网站中的结论这里直接附上，由于翻译可能表达不到位，这里写上原文：

Therefore a memory transaction is immediately visible across a full warp. As a result, memory requests are issued at the per-warp level, rather than per-half-warp. However, a full memory request can only retrieve 128 bytes at a time. Therefore, for data sizes larger than 32 bits per thread per transaction, the memory controller may still break the request down into a half-warp size.

My view is that, especially for a beginner, it's not necessary to have a detailed understanding of half-warp. It's generally sufficient to understand that it refers to a group of 16 threads executing together and it has implications for memory requests.