标准uniform块布局
The following example illustrates the rules specified by the "std140"
layout.
layout(std140) uniform Example {
// Base types below consume 4 basic machine units
//
// base base align
// rule align off. off. bytes used
// ---- ------ ---- ---- -----------------------
float a; // 1 4 0 0 0..3
vec2 b; // 2 8 4 8 8..15
vec3 c; // 3 16 16 16 16..27
struct { // 9 16 28 32 (align begin)
int d; // 1 4 32 32 32..35
bvec2 e; // 2 8 36 40 40..47
} f; // 9 16 48 48 (pad end)
float g; // 1 4 48 48 48..51
float h[2]; // 4 16 52 64 64..67 (h[0])
// 80 80..83 (h[1])
// 4 16 84 96 (pad end of h)
mat2x3 i; // 5/4 16 96 96 96..107 (i, column 0)
// 112 112..123 (i, column 1)
// 5/4 16 124 128 (pad end of i)
struct { // 10 16 128 128 (align begin)
uvec3 j; // 3 16 128 128 128..139 (o[0].j)
vec2 k; // 2 8 140 144 144..151 (o[0].k)
float l[2]; // 4 16 152 160 160..163 (o[0].l[0])
// 176 176..179 (o[0].l[1])
// 4 16 180 192 (pad end of o[0].l)
vec2 m; // 2 8 192 192 192..199 (o[0].m)
mat3 n[2]; // 6/4 16 200 208 208..219 (o[0].n[0], column 0)
// 224 224..235 (o[0].n[0], column 1)
// 240 240..251 (o[0].n[0], column 2)
// 256 256..267 (o[0].n[1], column 0)
// 272 272..283 (o[0].n[1], column 1)
// 288 288..299 (o[0].n[1], column 2)
// 6/4 16 300 304 (pad end of o[0].n)
// 9 16 304 304 (pad end of o[0])
// 3 16 304 304 304..315 (o[1].j)
// 2 8 316 320 320..327 (o[1].k)
// 4 16 328 336 336..347 (o[1].l[0])
// 352 352..355 (o[1].l[1])
// 4 16 356 368 (pad end of o[1].l)
// 2 8 368 368 368..375 (o[1].m)
// 6/4 16 376 384 384..395 (o[1].n[0], column 0)
// 400 400..411 (o[1].n[0], column 1)
// 416 416..427 (o[1].n[0], column 2)
// 432 432..443 (o[1].n[1], column 0)
// 448 448..459 (o[1].n[1], column 1)
// 464 464..475 (o[1].n[1], column 2)
// 6/4 16 476 480 (pad end of o[1].n)
// 9 16 480 480 (pad end of o[1])
} o[2];
};
默认情况下,包含在uniform block 中的uniform 从 buffer storage 中的提取依赖与实现方式。应用程序可以通过OpenGL提供的查询函数来查询uniform 的偏移量。
在着色器中通过 layout 布局标识符 来控制 uniform 在uniform block中的布局。std140 布局规则可以使我们能够推导出uniform 在 uniform block 中的偏移量。
如果一个 unform block 声明在多个着色器中并连接成单独的程序,除非 uniform block 的声明、布局标识符都一致,否则着色器程序链接将失败。
当使用 std140 存储布局时,存储在缓冲区中结构成员的布局是根据声明位置以单调递增的顺序存储的。结构体和每个结构成员都有一个基本偏移量和一个基本对齐量,通过将基本偏移量舍入到基本对齐的倍数来计算对齐后的偏移量。结构的第一个成员的基本偏移量(base offset)取自结构本身的对齐偏移量(aligned offset),The base offset of all other structure members is derived by taking the offset of the last basic machine unit consumed by the previous member and adding one. Each structure member is stored in memory at its aligned offset. The members of a top-level uniform block are laid out in buffer storage by treating the uniform block as a structure with a base offset of zero.
规范中的对齐偏移量的计算写的比较绕口(可能英语不好,理解障碍,很多官方规范文档只有文字,没有图例更详细的解释),但大体上跟c语言结构体成员的对齐偏移量的类似。
(1)uniform 接口块第一成员的 align offset 和 base offset 一定为 0 ;
(2)当前 接口块成员的 base offset 是 上一次 uniform 接口块成员 消耗的基本机器字节数与上一次计算出的align offset的和, base offset = last uniform interface block menber consuming basic machine bytes + last align offset , 并对 计算出的 base offset 根据当前接口块成员的base align向上取整 ,求出当前成员的 对齐 偏移量 align offset.
以 float h[2] 的 align off. 的计算为例:
g 的 对齐偏移 为 48, 消耗了 4 个字节的空间, 那么 h[2] 的对齐偏移为 base offset = 48+4 =52, h[2] 是数组,元素的 base align 为 vec4 ,即 为 4X4 =16 个字节,base_offset rounded up to 64 才是 16的倍数, 将该值作为h[0] 内存起始地址 即 align offset = 64。
计算 h[1] 的对齐偏移量 : base_offset = align_offset + h[0] 消耗的机器基本字节数4, base_offset = 64+4 = 68, 使其是 base align 16 的倍数 80, 即 h[1] 的起始地址为80, 处理 完 h[1] , 此时 base offset = 80 + h[1] 消耗的机器基本字节数4 = 84, 求出当前 align offset , 即 84 向上取整到16的倍数 96.
(3)其它接口块成员的偏移量计算以此类推。
以上 计算align offset 过程要注意 对于 结构体 和 数组的处理 分别有 align begin , pad end, pad end 过程。
(1) If the member is a scalar consuming <N> basic machine units, the
base alignment is <N>.
如果成员是一个占用N个机器字节的标量,那么基准对齐量是 N;
(2) If the member is a two- or four-component vector with components
consuming <N> basic machine units, the base alignment is 2<N> or
4<N>, respectively.
如果成员是一个具有2个或4个成分的矢量,成分占用N个字节,那么基准对齐量是2N 或4N;
(3) If the member is a three-component vector with components consuming
<N> basic machine units, the base alignment is 4<N>.
如果成员是一个具有3个成分的矢量,成分占用N个字节,那么基准对齐量是4N.
1 , 2 , 3 准则说明非数组形式的基本数据类型 标量或矢量,他们的对齐为 N, 2N , 4N , N 一般是4字节,32bits, 也就是硬件基于4字节的形式访问硬件内存。(这里要注意,如果使用c\c++语言计算基础类型占用的内存大小,bool类型并非占用一个字节,而是4字节, 也就是标量占用4字节内存! 规范中用 sizeof(GLfloat) 去计算占用内存的大小容易造成歧义,这里的类型大小N应该是硬件默认的内存对齐大小 ,4个字节)。
(4) If the member is an array of scalars or vectors, the base alignment
and array stride are set to match the base alignment of a single
array element, according to rules (1), (2), and (3), and rounded up
to the base alignment of a vec4**. The array may have padding at the
end; the base offset of the member following the array is rounded up
to the next multiple of the base alignment.
4 准则说明标量或向量的数组对齐情况,这里容易产生歧义。数组形式的内部成员对齐是先按1/2/3准则对齐后,在数组末尾用填充的形式补足字节数来符合vec4 的对齐要求,还是各个成员都按vec4 的形式对齐?这种情况下整个数组肯定也是按vec4 对齐的。看了相关的示例,应该是第二种情况。要注意关键点:如果是数组,不管数组元素的分量的元素是 多少[1\2\3\4]个 ,数组元素都按vec4 占用的字节数对齐。哪怕标量只占4个字节,它也是按vec4对齐!从这里可以看出std140 布局的在某些情况下会浪费不少内存,所以会出现std430 布局规则,并配合着色器存储缓冲对象SSBO使用,减少内存占用。虽然浪费了内存,但shader中访问数据更高效。
(5) If the member is a column-major matrix with <C> columns and <R>
rows, the matrix is stored identically to an array of <C> column
vectors with <R> components each, according to rule (4).
这个很好理解,如果是列主矩阵,可以看成是列数组的形式存储,各列按数组对齐准则对齐。因为数组中的元素不管是标量还是向量都是按vec4 对齐的,矩阵中的行或列都看成向量,也就是mat2, mat3, mat4 ,可看分别看成是 vec2, vec3, vec4向量数组的形式)。
(6) If the member is an array of <S> column-major matrices with <C>
columns and <R> rows, the matrix is stored identically to a row of
<S>*<C> column vectors with <R> components each, according to rule
(4).
矩阵内存按vec4 对齐了,那么矩阵数组内存也一定是按vec4 对齐…
(7) If the member is a row-major matrix with <C> columns and <R> rows,
the matrix is stored identically to an array of <R> row vectors
with <C> components each, according to rule (4).
行矩阵内存对齐规则同列矩阵,遵守向量数组的对齐规则。
(8) If the member is an array of <S> row-major matrices with <C> columns
and <R> rows, the matrix is stored identically to a row of <S>*<R>
row vectors with <C> components each, according to rule (4).
(9) If the member is a structure, the base alignment of the structure is
<N>, where <N> is the largest base alignment value of any of its
members, and rounded up to the base alignment of a vec4. The
individual members of this sub-structure are then assigned offsets
by applying this set of rules recursively, where the base offset of
the first member of the sub-structure is equal to the aligned offset
of the structure. The structure may have padding at the end; the
base offset of the member following the sub-structure is rounded up
to the next multiple of the base alignment of the structure.
(10) If the member is an array of <S> structures, the <S> elements of
the array are laid out in order, according to rule (9).
For uniform blocks laid out according to these rules, the minimum buffer
object size returned by the UNIFORM_BLOCK_DATA_SIZE query is derived by
taking the offset of the last basic machine unit consumed by the last
uniform of the uniform block (including any end-of-array or
end-of-structure padding), adding one, and rounding up to the next
multiple of the base alignment required for a vec4.