GLSL std140布局规则-CSDN博客

本文深入解析了OpenGL中的std140布局规则，这是一种用于uniform块内存对齐的标准。内容包括不同数据类型的对齐原则，如标量、向量、矩阵以及结构体和数组的排列方式。std140布局虽然可能导致内存浪费，但能确保高效的着色器数据访问。文章还介绍了如何计算各成员的对齐偏移量，并通过示例详细阐述了计算过程。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

标准uniform块布局

The following example illustrates the rules specified by the "std140"
    layout.

      layout(std140) uniform Example {

                      // Base types below consume 4 basic machine units
                      //
                      //       base   base  align
                      // rule  align  off.  off.  bytes used
                      // ----  ------ ----  ----  -----------------------
        float a;      //  1       4     0    0    0..3
        vec2 b;       //  2       8     4    8    8..15
        vec3 c;       //  3      16    16   16    16..27
        struct {      //  9      16    28   32    (align begin)
          int d;      //  1       4    32   32    32..35
          bvec2 e;    //  2       8    36   40    40..47
        } f;          //  9      16    48   48    (pad end)
        float g;      //  1       4    48   48    48..51
        float h[2];   //  4      16    52   64    64..67 (h[0])
                      //                    80    80..83 (h[1])
                      //  4      16    84   96    (pad end of h)
        mat2x3 i;     // 5/4     16    96   96    96..107 (i, column 0)
                      //                   112    112..123 (i, column 1)
                      // 5/4     16   124  128    (pad end of i)
        struct {      //  10     16   128  128    (align begin)
          uvec3 j;    //  3      16   128  128    128..139 (o[0].j)
          vec2 k;     //  2       8   140  144    144..151 (o[0].k)
          float l[2]; //  4      16   152  160    160..163 (o[0].l[0])
                      //                   176    176..179 (o[0].l[1])
                      //  4      16   180  192    (pad end of o[0].l)
          vec2 m;     //  2       8   192  192    192..199 (o[0].m)
          mat3 n[2];  // 6/4     16   200  208    208..219 (o[0].n[0], column 0)
                      //                   224    224..235 (o[0].n[0], column 1)
                      //                   240    240..251 (o[0].n[0], column 2)
                      //                   256    256..267 (o[0].n[1], column 0)
                      //                   272    272..283 (o[0].n[1], column 1)
                      //                   288    288..299 (o[0].n[1], column 2)
                      // 6/4     16   300  304    (pad end of o[0].n)
                      //  9      16   304  304    (pad end of o[0])
                      //  3      16   304  304    304..315 (o[1].j)
                      //  2       8   316  320    320..327 (o[1].k)
                      //  4      16   328  336    336..347 (o[1].l[0])
                      //                   352    352..355 (o[1].l[1])
                      //  4      16   356  368    (pad end of o[1].l)
                      //  2       8   368  368    368..375 (o[1].m)
                      // 6/4     16   376  384    384..395 (o[1].n[0], column 0)
                      //                   400    400..411 (o[1].n[0], column 1)
                      //                   416    416..427 (o[1].n[0], column 2)
                      //                   432    432..443 (o[1].n[1], column 0)
                      //                   448    448..459 (o[1].n[1], column 1)
                      //                   464    464..475 (o[1].n[1], column 2)
                      // 6/4     16   476  480    (pad end of o[1].n)
                      //  9      16   480  480    (pad end of o[1])
        } o[2];
      };

默认情况下，包含在uniform block 中的uniform 从 buffer storage 中的提取依赖与实现方式。应用程序可以通过OpenGL提供的查询函数来查询uniform 的偏移量。

在着色器中通过 layout 布局标识符来控制 uniform 在uniform block中的布局。std140 布局规则可以使我们能够推导出uniform 在 uniform block 中的偏移量。

如果一个 unform block 声明在多个着色器中并连接成单独的程序，除非 uniform block 的声明、布局标识符都一致，否则着色器程序链接将失败。

当使用 std140 存储布局时，存储在缓冲区中结构成员的布局是根据声明位置以单调递增的顺序存储的。结构体和每个结构成员都有一个基本偏移量和一个基本对齐量，通过将基本偏移量舍入到基本对齐的倍数来计算对齐后的偏移量。结构的第一个成员的基本偏移量（base offset）取自结构本身的对齐偏移量(aligned offset)，The base offset of all other structure members is derived by taking the offset of the last basic machine unit consumed by the previous member and adding one. Each structure member is stored in memory at its aligned offset. The members of a top-level uniform block are laid out in buffer storage by treating the uniform block as a structure with a base offset of zero.

规范中的对齐偏移量的计算写的比较绕口(可能英语不好，理解障碍，很多官方规范文档只有文字，没有图例更详细的解释)，但大体上跟c语言结构体成员的对齐偏移量的类似。

（1）uniform 接口块第一成员的 align offset 和 base offset 一定为 0 ；
（2）当前接口块成员的 base offset 是 上一次 uniform 接口块成员消耗的基本机器字节数与上一次计算出的align offset的和， base offset = last uniform interface block menber consuming basic machine bytes + last align offset ，并对计算出的 base offset 根据当前接口块成员的base align向上取整，求出当前成员的对齐偏移量 align offset.
以 float h[2] 的 align off. 的计算为例：
g 的对齐偏移为 48，消耗了 4 个字节的空间，那么 h[2] 的对齐偏移为 base offset = 48+4 =52, h[2] 是数组，元素的 base align 为 vec4 ,即为 4X4 =16 个字节，base_offset rounded up to 64 才是 16的倍数，将该值作为h[0] 内存起始地址即 align offset = 64。
计算 h[1] 的对齐偏移量： base_offset = align_offset + h[0] 消耗的机器基本字节数4， base_offset = 64+4 = 68, 使其是 base align 16 的倍数 80，即 h[1] 的起始地址为80，处理完 h[1] ，此时 base offset = 80 + h[1] 消耗的机器基本字节数4 = 84，求出当前 align offset , 即 84 向上取整到16的倍数 96.
（3）其它接口块成员的偏移量计算以此类推。
以上计算align offset 过程要注意对于结构体和数组的处理分别有 align begin , pad end, pad end 过程。

  (1) If the member is a scalar consuming <N> basic machine units, the
      base alignment is <N>.
      如果成员是一个占用N个机器字节的标量，那么基准对齐量是 N；

  (2) If the member is a two- or four-component vector with components
      consuming <N> basic machine units, the base alignment is 2<N> or
      4<N>, respectively.
      如果成员是一个具有2个或4个成分的矢量，成分占用N个字节，那么基准对齐量是2N 或4N；

  (3) If the member is a three-component vector with components consuming
      <N> basic machine units, the base alignment is 4<N>.
      如果成员是一个具有3个成分的矢量，成分占用N个字节，那么基准对齐量是4N.

1 , 2 , 3 准则说明非数组形式的基本数据类型标量或矢量，他们的对齐为 N, 2N , 4N , N 一般是4字节，32bits, 也就是硬件基于4字节的形式访问硬件内存。(这里要注意，如果使用c\c++语言计算基础类型占用的内存大小，bool类型并非占用一个字节，而是4字节, 也就是标量占用4字节内存！规范中用 sizeof(GLfloat) 去计算占用内存的大小容易造成歧义，这里的类型大小N应该是硬件默认的内存对齐大小，4个字节)。

  (4) If the member is an array of scalars or vectors, the base alignment
      and array stride are set to match the base alignment of a single
      array element, according to rules (1), (2), and (3), and  rounded up
      to the base alignment of a vec4**. The array may have padding at the
      end; the base offset of the member following the array is rounded up
      to the next multiple of the base alignment.

4 准则说明标量或向量的数组对齐情况，这里容易产生歧义。数组形式的内部成员对齐是先按1/2/3准则对齐后，在数组末尾用填充的形式补足字节数来符合vec4 的对齐要求，还是各个成员都按vec4 的形式对齐？这种情况下整个数组肯定也是按vec4 对齐的。看了相关的示例，应该是第二种情况。要注意关键点：如果是数组，不管数组元素的分量的元素是多少[1\2\3\4]个，数组元素都按vec4 占用的字节数对齐。哪怕标量只占4个字节，它也是按vec4对齐！从这里可以看出std140 布局的在某些情况下会浪费不少内存，所以会出现std430 布局规则，并配合着色器存储缓冲对象SSBO使用，减少内存占用。虽然浪费了内存，但shader中访问数据更高效。

  (5) If the member is a column-major matrix with <C> columns and <R>
      rows, the matrix is stored identically to an array of <C> column
      vectors with <R> components each, according to rule (4).

这个很好理解，如果是列主矩阵，可以看成是列数组的形式存储，各列按数组对齐准则对齐。因为数组中的元素不管是标量还是向量都是按vec4 对齐的，矩阵中的行或列都看成向量，也就是mat2, mat3, mat4 ，可看分别看成是 vec2， vec3, vec4向量数组的形式）。

  (6) If the member is an array of <S> column-major matrices with <C>
      columns and <R> rows, the matrix is stored identically to a row of
      <S>*<C> column vectors with <R> components each, according to rule
      (4).

矩阵内存按vec4 对齐了，那么矩阵数组内存也一定是按vec4 对齐…

  (7) If the member is a row-major matrix with <C> columns and <R> rows,
      the matrix is stored identically to an array of <R> row vectors
      with <C> components each, according to rule (4).

行矩阵内存对齐规则同列矩阵，遵守向量数组的对齐规则。

  (8) If the member is an array of <S> row-major matrices with <C> columns
      and <R> rows, the matrix is stored identically to a row of <S>*<R>
      row vectors with <C> components each, according to rule (4).

  (9) If the member is a structure, the base alignment of the structure is
      <N>, where <N> is the largest base alignment value of any of its
      members, and rounded up to the base alignment of a vec4. The
      individual members of this sub-structure are then assigned offsets 
      by applying this set of rules recursively, where the base offset of
      the first member of the sub-structure is equal to the aligned offset
      of the structure. The structure may have padding at the end; the 
      base offset of the member following the sub-structure is rounded up
      to the next multiple of the base alignment of the structure.

  (10) If the member is an array of <S> structures, the <S> elements of
       the array are laid out in order, according to rule (9).

For uniform blocks laid out according to these rules, the minimum buffer
object size returned by the UNIFORM_BLOCK_DATA_SIZE query is derived by
taking the offset of the last basic machine unit consumed by the last
uniform of the uniform block (including any end-of-array or
end-of-structure padding), adding one, and rounding up to the next
multiple of the base alignment required for a vec4.