图形引擎实战：Native RenderPass原理及URP应用（上）

搜狐畅游引擎部

于 2023-07-03 08:00:00 发布

阅读量1.5k

点赞数

分类专栏：图形引擎实战文章标签： unity

本文链接：https://blog.csdn.net/qq_41166022/article/details/131479543

版权

图形引擎实战专栏收录该内容

71 篇文章

订阅专栏

前言

或许你曾经听过一种说法，移动端游戏开后处理会特别费，有没有想过这个费体现在哪里，为什么特意提移动端，很容易想到的是，提到手机是因为手机的硬件架构会使后处理类的屏幕空间操作很费。
另一方面，Vulkan1.2前的初学者也会很气恼，RenderPass到底有什么作用，为什么简简单单画个三角形都要填一大堆RenderPass的结构体。
本篇文章会从硬件结构出发，讲述RenderPass的原理，了解RenderPass的适用场景，以及Vulkan、Unity SRP、URP的写法，并且提供一些实践案例。

1. RenderPass 概述

了解移动端TBR渲染架构会更加有利于对RenderPass这一概念的理解，所以先简单介绍下TBR架构。

Tile-based Rendering

移动端CPU、GPU、物理内存集成在一起，显存是映射到物理内存的虚拟内存，GPU渲染时直接向显存写入显然不是效率高的方法，一般是写入GPU上集成的缓存，一般称为on-chip buffer。
在移动端，这个缓存不够大，无法装下整个RenderTarget，一般是渲染RT的一部分，再将渲染完的部分传递回System Memory，这一部分的形状是块状（因为分块比覆盖行更接近三角形覆盖范围，更容易命中缓存）。
这就是移动端GPU的常见渲染方式Tile-based renderer，既基于图块渲染（TBR）。

三角形出现在屏幕上的位置完全是随机的，如果依次渲染三角形，然后将所有光栅化覆盖到的Tile都从System Memory拿出来渲染一遍再写回去，很显然会出现大量带宽消耗，因此TBR需要一个分箱（Binning）操作，主要体现在，VS输出Clip space的顶点数据后，不会立刻进行光栅化，而是进入分箱阶段，将每个Tile会拥有哪些三角形分箱，然后将数据传回系统内存，等到所有drawcall的顶点着色器走完，再依次对每个tile进行光栅化，tile取出自己拥有的三角形进行光栅和着色。

RenderPass

一般情况下，每个Tile渲染完传递回System Memory，所有Tile渲染完，再Present给屏幕即可，所有的drawcall看起来都处于一条流水线、一个流程中。但有时还需要将数据传递回GPU，这种需求常见为后处理，例如我想对图像做一个高斯模糊，就是画一个全屏的三角形，然后按照屏幕空间UV读取之前的RT输出结果。
这中间有一个切换RT的过程，因为一般情况下RenderTarget不可以既读又写，而切换RT，就必须保证切换RT之前所有Tile都渲染完毕并写回SystemMemory。
都渲染完毕是因为RT切换后就要读取这个RT了，对这个RT的访问是完全未知的，就像高斯模糊时，每个像素会读取附近的像素，如果是RayMarch等算法，更是完全无法预测采样之前渲染RT的位置，因此切换RT必须保证之前所有drawcall都渲染完毕。
写回SystemMemory的原因上面说过，On-Chip buffer无法装下整个RT。切换RT后GPU如果还想访问之前RT的内容（例如纹理采样），就需要按照采样的方法，通过UV获取图像像素位置，然后将图像附近的像素读取到缓存中，这样又走了一遍GPU与SystemMemory的带宽。
切换了RT就好像打断了渲染的流程，因此这个切换RT的前后就可以视作两个RenderPass，切换的代价是，一写和多读的带宽消耗。

SubPass

上述提到过，切换RT需要所有Tile都渲染完毕并写回SystemMemory，其中重要的原由是，下一个RenderPass的采样无法预测，但有时并非无法预测。
比如说延迟渲染中，GBufferPass会将几何数据写入GBuffer，DeferredShadingPass会读取GBuffer中的数据做Rendering；DeferredShading也是一个全屏Pass，它能保证自己的RT尺寸肯定与GBuffer尺寸一致，并且着色的像素位置，肯定只会采样GBuffer中同样位置的像素，这样采样的规则就就是简单可预测的了。
于是有一个显而易见的优化方法，GBufferPass的某个Tile渲染完当前所有Tile的三角形，可以不用写回SystemMemory，直接在当前Tile走下一个DeferredShadingPass，每个像素获取当前像素位置的GBuffer数据，这样节省了好几张GBuffer要写回SystemMemory，然后又要被采样读取的消耗。
这样像素只获取当前位置像素数据的pass，例如GBufferPass和DeferredShadingPass，就可以看做同一RenderPass下的两个SubPass。

暂态资源

如果当前管线只存在GBufferPass和DeferredShadingPass，我只需要DeferredShadingPass的颜色输出结果，而其他的中间过程数据，比如albedo、normal、depth不需要被后续RenderPass使用，则这些GBuffer就是暂态资源，暂态资源无需申请RT，因为它们只是计算的中间数据，后续RenderPass不使用意味着无需传递回系统内存；因此当确定资源是符合暂态的情况，并且不申请RT，就可以节省RT的内存占用开销。

相比于多个RenderPass的写法，单个RenderPass节省了片载缓存和SystemMemory之间的带宽；又因为不需要整个RT绘制完，不需要经过同步、转换资源状态，因此提高了管线的并行性；同时资源暂态的可能性，会减少内存占用。

2. Vulkan的RenderPass

对于Vulkan而言，RenderPass是一组Attachment、它们的使用方式，以及它们执行的渲染工作。
在其他API中，RenderPass的变更对应于设置新的RenderTarget；对于Unity而言，大多数CommandBuffer的命令更贴近于传统API，因此对应到底层时，每切换一个RT，就会出现一个新的RenderPass；如果我们想要让自定义的渲染管线支持Native RenderPass，就需要依靠接近于Native的写法，因此我们需要先了解Vulkan中，应该如何实现RenderPass。

下面的案例会以input attachment sample为示例。这个sample包含一个RenderPass、两个SubPass，第一个SubPass输出颜色和深度，第二个SubPass根据面板设置，将颜色或深度值输出到交换链的RT上。
对于多个SubPass是否能作为一个RenderPass，首先就是要保证所有SubPass的RT尺寸一致，samples一致，也要明确每个像素只会获取同样位置像素的内容。
其次，一个RenderPass中的多个SubPass必须是连续的，中间不可夹杂其他RenderPass。这一点是必然的，原理上是tile执行一个SubPass后立即执行下一个SubPass，如果被打断了，Tile显然不能存在OnChip Buffer上，这样就不是RenderPass了。

2.1 Shader语法与绑定

像素位置纹理值获取不是靠程序员来控制，从着色器层面就用语法规定死了，首先资源声明不是texture，而是subpassInput，例如下列glsl：

layout(input_attachment_index = 0, set = 0, binding = 0) uniform subpassInput inputColor;
layout(input_attachment_index = 1, set = 0, binding = 1) uniform subpassInput inputDepth;

然后获取值是通过load而不是sample，因为只会获取当前像素位置的纹理值，所以也不需要传递UV:

void main()
{
    if (ubo.attachmentIndex == 0)
    {
        //直接读取颜色
        vec3 color = subpassLoad(inputColor).xyz;
        outColor = vec4(color, 1);
    }
    if (ubo.attachmentIndex == 1)
    {
        //直接读取深度
        float depth = subpassLoad(inputDepth).x;
        vec3 _93 = vec3(((depth - ubo.range.x) * 1.0) / (ubo.range.y - ubo.range.x));
        outColor = vec4(_93.x, _93.y, _93.z, outColor.w);
    }
}

相应的vulkan也有一个Descriptor类型与之对应，就是VK_DESCRIPTOR_TYPE_INPUT_ATTACHMENT。对于dxc编译的hlsl，也有特定相关类型：

class SubpassInput<T> {
  T SubpassLoad();
};

class SubpassInputMS<T> {
  T SubpassLoad(int sampleIndex);
};

可以用[[vk::input_attachment_index(i)]] SubpassInput input;定义SubpassInput，用SubpassLoad获取值。

2.2 RenderPass的调度

首先，在准备CommandBuffer时，首先就是要BeginRenderPass，下面的代码经过简化：

void buildCommandBuffers()
{
    VkCommandBufferBeginInfo cmdBufInfo = vks::initializers::commandBufferBeginInfo();

    // clear color
    VkClearValue clearValues[3];
    clearValues[0].color = { { 0.0f, 0.0f, 0.2f, 0.0f } };
    clearValues[1].color = { { 0.0f, 0.0f, 0.2f, 0.0f } };
    clearValues[2].depthStencil = { 1.0f, 0 };

    // render pass begin info
    VkRenderPassBeginInfo renderPassBeginInfo = vks::initializers::renderPassBeginInfo();
    renderPassBeginInfo.renderPass = renderPass;//绑定的RenderPass对象
    renderPassBeginInfo.renderArea.offset.x = 0;
    renderPassBeginInfo.renderArea.offset.y = 0;
    renderPassBeginInfo.renderArea.extent.width = width;
    renderPassBeginInfo.renderArea.extent.height = height;
    renderPassBeginInfo.clearValueCount = 3;
    renderPassBeginInfo.pClearValues = clearValues;

    // 多个CommandBuffer用于多帧渲染，因此在此示例中可以忽略
    for (int32_t i = 0; i < drawCmdBuffers.size(); ++i) {
        // 指定渲染用到的实际图像，一个FrameBuffer是一组图像
        renderPassBeginInfo.framebuffer = frameBuffers[i];

        // 调用BeginRenderPass
        vkCmdBeginRenderPass(drawCmdBuffers[i], &renderPassBeginInfo, VK_SUBPASS_CONTENTS_INLINE);

        /*
            First sub pass
            Fills the attachments
        */
        {
            vkCmdBindPipeline(drawCmdBuffers[i], VK_PIPELINE_BIND_POINT_GRAPHICS, pipelines.attachmentWrite);
            vkCmdBindDescriptorSets(drawCmdBuffers[i], VK_PIPELINE_BIND_POINT_GRAPHICS, pipelineLayouts.attachmentWrite, 0, 1, &descriptorSets.attachmentWrite, 0, NULL);
            scene.draw(drawCmdBuffers[i]);
        }

        /*
            Second sub pass
            Render a full screen quad, reading from the previously written attachments via input attachments
        */
        {
            vkCmdNextSubpass(drawCmdBuffers[i], VK_SUBPASS_CONTENTS_INLINE);

            vkCmdBindPipeline(drawCmdBuffers[i], VK_PIPELINE_BIND_POINT_GRAPHICS, pipelines.attachmentRead);
            vkCmdBindDescriptorSets(drawCmdBuffers[i], VK_PIPELINE_BIND_POINT_GRAPHICS, pipelineLayouts.attachmentRead, 0, 1, &descriptorSets.attachmentRead[i], 0, NULL);
            vkCmdDraw(drawCmdBuffers[i], 3, 1, 0, 0);
        }

BeginRenderPass是用来设置实际用到的RT、clear颜色，然后下面默认是第一个SubPass，第二个SubPass渲染前，调用了NextSubPass，至于每个Pass有几个RT、什么格式、是否Clear、写回SystemMemory，这些在RenderPass对象创建时就决定了。

2.3 RenderPass的创建

首先定义RenderPass中所有SubPass会使用的Attachment，当前示例有三个，交换链RT一个，颜色RT和深度RT各一个：

void setupRenderPass()
{
    std::array<VkAttachmentDescription, 3> attachments{};

    // Swap chain image color attachment
    // Will be transitioned to present layout
    attachments[0].format = swapChain.colorFormat;
    attachments[0].samples = VK_SAMPLE_COUNT_1_BIT;
    attachments[0].loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR;
    attachments[0].storeOp = VK_ATTACHMENT_STORE_OP_STORE;
    attachments[0].stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
    attachments[0].stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE;
    attachments[0].initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;
    attachments[0].finalLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR;

    // Input attachments
    // These will be written in the first subpass, transitioned to input attachments
    // and then read in the secod subpass

    // Color
    attachments[1].format = colorFormat;
    attachments[1].samples = VK_SAMPLE_COUNT_1_BIT;
    attachments[1].loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR;
    attachments[1].storeOp = VK_ATTACHMENT_STORE_OP_DONT_CARE;
    attachments[1].stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
    attachments[1].stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE;
    attachments[1].initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;
    attachments[1].finalLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
    // Depth
    attachments[2].format = depthFormat;
    attachments[2].samples = VK_SAMPLE_COUNT_1_BIT;
    attachments[2].loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR;
    attachments[2].storeOp = VK_ATTACHMENT_STORE_OP_DONT_CARE;
    attachments[2].stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
    attachments[2].stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE;
    attachments[2].initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;
    attachments[2].finalLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL;

这个Attachment并非是实际RT，而是提前对要设置RT的声明，实际上设置的RT在FrameBuffer中，在BeginRenderPass时设置。
第一个Attachment是交换链RT，因此无需load，需要store回SystemMemory；第二、三个Attachment分别是第一个SubPass的颜色和深度，需要Clear，并不需要store。
然后设置每个SubPass对Attachment的引用：

/*
        First subpass
        Fill the color and depth attachments
    */
    VkAttachmentReference colorReference = { 1, VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL };
    VkAttachmentReference depthReference = { 2, VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL };

    subpassDescriptions[0].pipelineBindPoint = VK_PIPELINE_BIND_POINT_GRAPHICS;
    subpassDescriptions[0].colorAttachmentCount = 1;
    subpassDescriptions[0].pColorAttachments = &colorReference;
    subpassDescriptions[0].pDepthStencilAttachment = &depthReference;

    /*
        Second subpass
        Input attachment read and swap chain color attachment write
    */

    // Color reference (target) for this sub pass is the swap chain color attachment
    VkAttachmentReference colorReferenceSwapchain = { 0, VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL };

    subpassDescriptions[1].pipelineBindPoint = VK_PIPELINE_BIND_POINT_GRAPHICS;
    subpassDescriptions[1].colorAttachmentCount = 1;
    subpassDescriptions[1].pColorAttachments = &colorReferenceSwapchain;

    // Color and depth attachment written to in first sub pass will be used as input attachments to be read in the fragment shader
    VkAttachmentReference inputReferences[2];
    inputReferences[0] = { 1, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL };
    inputReferences[1] = { 2, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL };

    // Use the attachments filled in the first pass as input attachments
    subpassDescriptions[1].inputAttachmentCount = 2;
    subpassDescriptions[1].pInputAttachments = inputReferences;

上述代码中inputReferences内color和depth都被设置为VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL，代码都能正常work，实际上深度被设置为VK_IMAGE_LAYOUT_DEPTH_STENCIL_READ_ONLY_OPTIMAL更准确（这个是示例的问题）。
后面还有一些涉及到layout转换的代码，在这里不重要先不展开，最后就是调用vkCreateRenderPass：

VkRenderPassCreateInfo renderPassInfoCI{};
    renderPassInfoCI.sType = VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO;
    renderPassInfoCI.attachmentCount = static_cast<uint32_t>(attachments.size());
    renderPassInfoCI.pAttachments = attachments.data();
    renderPassInfoCI.subpassCount = static_cast<uint32_t>(subpassDescriptions.size());
    renderPassInfoCI.pSubpasses = subpassDescriptions.data();
    renderPassInfoCI.dependencyCount = static_cast<uint32_t>(dependencies.size());
    renderPassInfoCI.pDependencies = dependencies.data();
    VK_CHECK_RESULT(vkCreateRenderPass(device, &renderPassInfoCI, nullptr, &renderPass));

RenderPass是Vulkan的概念，实际上作为更早的移动端常用API，OpenglES也有类似的拓展，一个是FrameBufferFetch，另一个是PixelLocalStorage，本篇里之所以不提及，是因为本篇的工程实践主要是Unity，RenderPass是VK的原生功能，Unity的shader编译器及SRP都提供了相关方法；相比之下，FBF和PLS是gles的拓展，两大移动端芯片商Mali和Adreno分别对这两个拓展支持的更好，对另一个支持有限；同时，不同拓展也涉及到shader语法的不同，作为unity srp shader的DSL的hlsl，由hlslcc支持了FBF的写法，但对PLS没有支持，因此本篇不会探究GLES的片载缓存优化，想要了解的可以去找下UE移动端的相关代码，对这两个拓展都做了支持。

3. SRP对RenderPass的封装及用法

下面会用CustomSRP作为示例。
这个示例有两个subpass，第一个subpass输出albedo、emission和depth，第二个subpass读取albedo、emission输出到output，我做了简单的修改，来实验将深度作为input attachment的写法，因此第二个subpass还会读取depth。
同样是先定义RenderPass所有会用到的Attachment

AttachmentDescriptor m_Albedo = new AttachmentDescriptor(RenderTextureFormat.Default);
AttachmentDescriptor m_Emission = new AttachmentDescriptor(RenderTextureFormat.Default);
AttachmentDescriptor m_Output = new AttachmentDescriptor(RenderTextureFormat.Default);
AttachmentDescriptor m_Depth = new AttachmentDescriptor(RenderTextureFormat.Depth);

NativeArray<AttachmentDescriptor> renderPassAttachments = new NativeArray<AttachmentDescriptor>(4, Allocator.Temp);
renderPassAttachments[0] = m_Albedo;
renderPassAttachments[1] = m_Emission;
renderPassAttachments[2] = m_Output;
renderPassAttachments[3] = m_Depth;

定义第一个subpass的颜色输出，分别输出albedo和emission，数字是在上面定义的renderPassAttachments中的索引：

NativeArray<int> renderPassColorAttachments = new NativeArray<int>(2, Allocator.Temp);
renderPassColorAttachments[0] = 0;
renderPassColorAttachments[1] = 1;

定义第二个subpass的输入，分别输入albedo、emission以及depth，以及第二个subpass的颜色输出：

NativeArray<int> renderPassColorDepthInputAttachments = new NativeArray<int>(3, Allocator.Temp);
renderPassColorDepthInputAttachments[0] = 0;
renderPassColorDepthInputAttachments[1] = 1;
renderPassColorDepthInputAttachments[2] = 3;

NativeArray<int> renderPassOutputAttachments = new NativeArray<int>(1, Allocator.Temp);
renderPassOutputAttachments[0] = 2;

然后将Attachment绑定到指定RT，以及设置Clear状态：

m_Output.ConfigureTarget(m_ColorRT, loadExistingContents:false, storeResults:true);
m_Output.ConfigureClear(new Color(0.0f, 0.0f, 0.0f, 0.0f),clearDepth:1, clearStencil:0);
m_Albedo.ConfigureClear(camera.backgroundColor,1,0);
m_Emission.ConfigureClear(new Color(0.0f, 0.0f, 0.0f, 0.0f),1,0);
m_Depth.ConfigureClear(new Color(),1,0);

可以发现一个问题是，Albedo、Emission和Depth都没有ConfigureTarget，没有绑定实际RT，这是因为作为RenderPass的中间结果，这些Buffer的值不需要被后续调用（假设现在没有SSAO这些后续的RenderPass），所以这些Buffer相当于暂态资源（Transient Resource），本身是无需申请RT来做内存存储的（Memory-less），这也是RenderPass优化的一个优势。
接着就可以BeginRenderPass了，SRP的API分别是：BeginRenderPass、EndRenderPass、以及对应于C# using语法的BeginScopedRenderPass

using ( context.BeginScopedRenderPass(camera.pixelWidth, camera.pixelHeight, samples:1, attachments:renderPassAttachments, depthAttachmentIndex:3) )
{
    //Output to Albedo & Emission
    using ( context.BeginScopedSubPass(renderPassColorAttachments, isDepthStencilReadOnly:false) )
    {
        //Opaque objects
        sortingSettings.criteria = SortingCriteria.CommonOpaque;
        drawSettings1.sortingSettings = sortingSettings;
        filterSettings.renderQueueRange = RenderQueueRange.opaque;
        context.DrawRenderers(cull, ref drawSettings1, ref filterSettings);
    }
    //Read from Albedo & Emission, then output to Output
    using ( context.BeginScopedSubPass(renderPassOutputAttachments,renderPassColorDepthInputAttachments, isDepthStencilReadOnly:true) )
    {
        //Skybox
        if(drawSkyBox)  {  context.DrawSkybox(camera);  }

        //Opaque objects
        sortingSettings.criteria = SortingCriteria.CommonOpaque;
        drawSettings2.sortingSettings = sortingSettings;
        filterSettings.renderQueueRange = RenderQueueRange.opaque;
        context.DrawRenderers(cull, ref drawSettings2, ref filterSettings);
    }

可以发现和Vulkan的几个区别，首先就是，Vulkan的AttachmentDescription像是定义一个插槽，而SRP的AttachmentDescriptor会绑定实际RT，这是因为SRP的设计上没有FrameBuffer。LoadStoreAction在绑定RT时设置。
然后就是，Vulkan的DepthAttachment是在subpass里，用VkAttachmentReference设置的，而SRP是在BeginRenderPass里设置的索引，在BeginSubPass里设置是否读Depth，这样就导致SRP整个RenderPass最多只能使用一张Depth作为DepthAttachment。
在BeginSubPass中，设置isDepthStencilReadOnly为true，可以令这个subpass不绑定深度缓冲，这样一来就可以读取深度缓冲，映射到底层，就是一个layout被设置为VK_IMAGE_LAYOUT_DEPTH_STENCIL_READ_ONLY_OPTIMAL的VkAttachmentReference
SRP对input attachment的load分成两种：
其一是unity自己拓展的hlslcc进行转换hlsl->glsl，用cbuffer定义，底层编译应该会转成input attachment：

//声明
#define FRAMEBUFFER_INPUT_FLOAT(idx) cbuffer hlslcc_SubpassInput_f_##idx { float4 hlslcc_fbinput_##idx; }
#define FRAMEBUFFER_INPUT_FLOAT_MS(idx) cbuffer hlslcc_SubpassInput_F_##idx { float4 hlslcc_fbinput_##idx[8]; }

//load
#define LOAD_FRAMEBUFFER_INPUT(idx, v2fname) hlslcc_fbinput_##idx
#define LOAD_FRAMEBUFFER_INPUT_MS(idx, sampleIdx, v2fname) hlslcc_fbinput_##idx[sampleIdx]

其二是用微软的dxc编译，用了拓展语法：

//声明
#define UNITY_DXC_SUBPASS_INPUT_TYPE_INDEX(type, idx) [[vk::input_attachment_index(idx)]] SubpassInput<type##4> hlslcc_fbinput_##idx
#define UNITY_DXC_SUBPASS_INPUT_TYPE_INDEX_MS(type, idx) [[vk::input_attachment_index(idx)]] SubpassInputMS<type##4> hlslcc_fbinput_##idx

#define FRAMEBUFFER_INPUT_FLOAT(idx) UNITY_DXC_SUBPASS_INPUT_TYPE_INDEX(float, idx)
#define FRAMEBUFFER_INPUT_FLOAT_MS(idx) UNITY_DXC_SUBPASS_INPUT_TYPE_INDEX_MS(float, idx)

//load
#define LOAD_FRAMEBUFFER_INPUT(idx, v2fname) hlslcc_fbinput_##idx.SubpassLoad()
#define LOAD_FRAMEBUFFER_INPUT_MS(idx, sampleIdx, v2fname) hlslcc_fbinput_##idx.SubpassLoad(sampleIdx)

SRP用宏对上述两种使用方法进行了封装，实际使用中直接声明+load即可。