AMD GCN - Vega Instruction Set Architecture

最新推荐文章于 2024-06-18 09:41:36 发布

Yongqiang Cheng

最新推荐文章于 2024-06-18 09:41:36 发布

阅读量1.4k

点赞数 1

世上没有白读的书，每一页都算数。

本文链接：https://blog.csdn.net/chengyq116/article/details/89002716

版权

ARM Mali / AMD GPU - OpenCL 专栏收录该内容

24 篇文章 10 订阅

订阅专栏

AMD GCN - Vega Instruction Set Architecture

https://rocmdocs.amd.com/en/latest/GCN_ISA_Manuals/testdocbook.html

1. GCN ISA Manuals

https://rocmdocs.amd.com/en/latest/GCN_ISA_Manuals/GCN-ISA-Manuals.html

Graphics Core Next，GCN：下一代图形核心
Instruction Set Architecture，ISA：指令集体系结构

GCN 1.1
http://developer.amd.com/wordpress/media/2013/07/AMD_Sea_Islands_Instruction_Set_Architecture1.pdf
ISA Manual for Hawaii (Sea Islands Series Instruction Set Architecture)

GCN 2.0
http://developer.amd.com/wordpress/media/2013/12/AMD_GCN3_Instruction_Set_Architecture_rev1.1.pdf
ISA Manual for Fiji and Polaris (Graphics Core Next Architecture, Generation 3)

Vega
https://rocmdocs.amd.com/en/latest/GCN_ISA_Manuals/testdocbook.html
Vega Instruction Set Architecture

Inline GCN ISA Assembly Guide
https://rocmdocs.amd.com/en/latest/GCN_ISA_Manuals/GCN-ISA-Manuals.html

Vega：织女星，织女一，天琴座 α (α Lyrae, Alpha Lyrae, Alpha Lyr or α Lyr)
vega [ˈviːgə]：n. 沼泽，草地，棕色河淤土
Polaris，North Star or Pole Star：n. 北极星，北辰，紫微星，勾陈一
Fiji：斐济

2. Terminology

Scalar ALU (SALU)
The scalar ALU operates on one value per wavefront and manages all control flow.
标量 ALU 对每个 wavefront 一个值进行操作并管理所有控制流。

Vector ALU (VALU)
The vector ALU maintains Vector GPRs that are unique for each work item and execute arithmetic operations uniquely on each work-item.
矢量 ALU 维护每个工作项唯一的矢量 GPR，并在每个工作项上唯一地执行算术运算。

Work-item
A single element of work: one element from the dispatch grid, or in graphics a pixel or vertex.
单个工作元素：来自调度网格的一个元素，或者在图形中的一个像素或顶点。

Workgroup
A workgroup is a collection of wavefronts that have the ability to synchronize with each other quickly; they also can share data through the Local Data Share.
工作组是能够相互快速同步的 wavefront 集合，他们还可以通过 Local Data Share 共享数据。

Wavefront
A collection of 64 work-items that execute in parallel on a single GCN processor.
在单个 GCN 处理器上并行执行的 64 个工作项的集合。

Dispatch
A dispatch launches a 1D, 2D, or 3D grid of work to the GCN processor array.
Dispatch 向 GCN 处理器阵列启动 1D、2D 或 3D 工作网格。

GCN Processor
The Graphics Core Next shader processor is a scalar and vector ALU capable of running complex programs on behalf of a wavefront.
Graphics Core Next 着色器处理器是一个标量和矢量 ALU，能够以 wavefront 为整体运行复杂的程序。

terminology [ˌtɜː(r)mɪˈnɒlədʒi]：n. 术语，有特别含义的用语，专门用语
scalar [ˈskeɪlə(r)]：n. 标量，纯量，数量，实量 adj. 标量的，纯量的，无向量的
wave front：波阵面
vector general purpose register，vGPR：矢量通用寄存器

3. Program Organization

Data Sharing

The AMD GCN stream processors can share data between different work-items. Data sharing can significantly boost performance. The figure below shows the memory hierarchy that is available to each work-item.
AMD GCN 流处理器可以在不同的工作项之间共享数据。数据共享可以显着提高性能。下图显示了每个工作项可用的内存层次结构

在这里插入图片描述

Shared Memory Hierarchy

Local Data Share (LDS)

Each Compute Unit has a 64 kB memory space that enables low-latency communication between work-items within a work-group, or the work-items within a wavefront; this is the local data share (LDS). This memory is configured with 32 banks, each with 512 entries of 4 bytes. The AMD GCN processors use a 64 kB local data share (LDS) memory for each Compute Unit; this enables 64 kB of low-latency bandwidth to the processing elements. The shared memory contains 32 integer atomic units to enable fast, unordered atomic operations. This memory can be used as a software cache for predictable re-use of data, a data exchange machine for the work-items of a work-group, or as a cooperative way to enable efficient access to off-chip memory.
每个 Compute Unit 都有一个 64 kB 的内存空间 local data share (LDS)，可以实现 work-group 内的 work-item 之间或 wavefront 内的 work-item 之间的低延迟通信。该存储器配置了 32 个存储体，每个存储体有 512 个 4 字节的条目。AMD GCN 处理器为每个 Compute Unit 使用 64 kB local data share (LDS) 内存，这为处理元件提供了 64 kB 的低延迟带宽。共享内存包含 32 个整数原子单元，以实现快速、无序的原子操作。该存储器可用作可预测的数据重用的软件缓存、用于工作组的工作项的数据交换机器，或用作实现对片外存储器的有效访问的协作方式。

Global Data Share (GDS)

The AMD GCN devices use a 64 kB global data share (GDS) memory that can be used by wavefronts of a kernel on all Compute Units. This memory provides 128 bytes per cycle of memory access to all the processing elements. The GDS is configured with 32 banks, each with 512 entries of 4 bytes each. It provides full access to any location for any processor. The shared memory contains 32 integer atomic units to enable fast, unordered atomic operations. This memory can be used as a software cache to store important control data for compute kernels, reduction operations, or a small global shared surface. Data can be preloaded from memory prior to kernel launch and written to memory after kernel completion. The GDS block contains support logic for unordered append/consume and domain launch ordered append/consume operations to buffers in memory. These dedicated circuits enable fast compaction of data or the creation of complex data structures in memory.
AMD GCN 设备使用 64 kB global data share (GDS) 内存，可供所有 Compute Unit 上的一个内核的 wavefront 使用。该内存为所有处理元件的每个内存访问周期提供 128 个字节。GDS 配置有 32 个 bank，每个 bank 有 512 个条目，每个条目 4 字节。它为任何处理器提供对任何位置的完全访问权限。共享内存包含 32 个整数原子单元，以实现快速、无序的原子操作。此内存可用作软件缓存，用于存储计算内核、归约操作或小型全局共享表面的重要控制数据。数据可以在内核启动之前从内存中预加载，并在内核完成后写入内存。GDS 块包含对内存中缓冲区的无序追加/消费和域启动有序追加/消费操作的支持逻辑。这些专用电路可以快速压缩数据或在内存中创建复杂的数据结构。