接触bank conflict时会引入 half-warp 这个概念,不过新的硬件架构中已经弱化这个概念了,并行单元都是 full-warp。具体的概述这里引用一个网站,个人认为适合新手来理解:
What's the mechanism of the warps and the banks in CUDA? - Stack Overflow
网站中的结论这里直接附上,由于翻译可能表达不到位,这里写上原文:
Therefore a memory transaction is immediately visible across a full warp. As a result, memory requests are issued at the per-warp level, rather than per-half-warp. However, a full memory request can only retrieve 128 bytes at a time. Therefore, for data sizes larger than 32 bits per thread per transaction, the memory controller may still break the request down into a half-warp size.
My view is that, especially for a beginner, it's not necessary to have a detailed understanding of half-warp. It's generally sufficient to understand that it refers to a group of 16 threads executing together and it has implications for memory requests.