GpuDriven中的alignment

最新推荐文章于 2024-04-06 15:51:38 发布

安柏霖

最新推荐文章于 2024-04-06 15:51:38 发布

阅读量933

点赞数

分类专栏： LowLevel

本文链接：https://blog.csdn.net/toughbro/article/details/104288333

版权

LowLevel 专栏收录该内容

38 篇文章 1 订阅

订阅专栏

最近在实现gpu driven的事情，在alignment和padding上面踩了一些坑；
先说结论，如果是想尽量少踩坑，在设计数据结构时候都保证是能pad到16byte（float4）倍数的会让问题少非常多。

这里罗列几个要点，有一些并没有明显的文档（可能有，但是常规文档中没看到）

gpu中读资源时候强制做了align操作，导致非16byte 的数据会出现错误

struct InstData
{
vec4 pos;
vec3 scale;
};
//使用InstData的const buffer，在每个instance读取的时候，地址会align（16 byte），导致数据读取错误；
//这样会好:
struct InstData
{
vec4 pos;
vec3 scale;
float padding;
};

注意c++和shader中的不同类的padding差异：如下数据如果gpu中读cpu的InstData类，数据就会因为padding出现错位

//cpp：
struct InstData
{
	u64 a;
	u32 b;
}

//gpu:
struct InstData
{
	UINT3 a;
}

3， structued buffer的非16byte align会导致数据跨cache line，导致performance降低：
https://developer.nvidia.com/content/understanding-structured-buffer-performance

4, 单个structure在hlsl中遵循：
– 至少是4byte pack
–pack member的时候，会避免跨越16byte的边界
–pack member的时候，会pack到一个4 vector的这种

https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-packing-rules

//  2 x 16byte elements
cbuffer IE
{
    float4 Val1;
    float2 Val2;  // starts a new vector
    float2 Val3;
};

//  3 x 16byte elements
cbuffer IE
{
    float2 Val1;
    float4 Val2;  // starts a new vector
    float2 Val3;  // starts a new vector
};

//  1 x 16byte elements
cbuffer IE
{
    float1 Val1;
    float1 Val2;
    float2 Val3;
};

//  1 x 16byte elements
cbuffer IE
{
    float1 Val1;
    float2 Val2;
    float1 Val3;
};

//  2 x 16byte elements
cbuffer IE
{
    float1 Val1;
    float1 Val1;
    float1 Val1;
    float2 Val2;    // starts a new vector
};


//  1 x 16byte elements
cbuffer IE
{
    float3 Val1;
    float1 Val2;
};

//  1 x 16byte elements
cbuffer IE
{
    float1 Val1;
    float3 Val2;
};

//  2 x 16byte elements
cbuffer IE
{
    float1 Val1;
    float1 Val1;
    float3 Val2;        // starts a new vector
};


// 3 x 16byte elements
cbuffer IE
{
    float1 Val1;

    struct     {
        float4 SVal1;    // starts a new vector
        float1 SVal2;    // starts a new vector
    } Val2;
};

// 3 x 16byte elements
cbuffer IE
{
    float1 Val1;  
    struct     {
        float1 SVal1;     // starts a new vector
        float4 SVal2;     // starts a new vector
    } Val2;
};

// 3 x 16byte elements
cbuffer IE
{
    struct     {
        float4 SVal1;
        float1 SVal2;    // starts a new vector
    } Val1;

    float1 Val2;   // starts a new vector
};