假定两套shader(即opengl中的program)都完全编译好,并都分配id等,只是未连接(link)而已
链接地址:http://www.opengpu.org/bbs/forum.php?mod=viewthread&tid=1300
可能消耗:
1. shader中有无写z和alpha test会决定是否启用heirachical z,如这类的shader切换发生,则需要把之前的z buffer都resolved掉;
2. shader对texture sampler slot使用的不同可能会导致不用的texture cache layout,往往需要dirty整个texture cache;(CPU API层)
其中可能有各类shader source的更新和填充,比如Constant Buffer数据(对应opengl中的UniformBuffer)
(若等待切换的shader曾经链接过,即所有 texture uniform都已有opengl id,贴图数据未改变的情况下,会判断render contex中的16张贴图是否存有当前shader所需贴图,若没有会调用一次bind texture;而非texture uniform若uniform location已得到时,只会根据其值是否改变,来决定是否重新上传实际数值)
3. 有些比较长的shader,其指令并不能放在chip上,那么切换时就需要从外存中载入,这个开销相对较小;(GPU层)
shader切换开销中有 shader和input/output的绑定和错误检查的开销,这个是在GPU上发生的。
记得是D3D Runtime构建command buffer,然后批量发给driver,这时会有一次user mode到kernel mode的switch, 所以开销较大。之后driver再对command buffer做处理,待显卡空闲时逐步发给显卡来执行。API里的设备状态是D3D runtime维护的,所以才有pure device选项,让D3D runtime放弃维护状态以换取更高的性能。冗余状态设置的消除判断也不是总是做的,因为有开销,而且经常是浪费。D3D把多数是否做冗余消除的选择交 给了driver,以便根据显卡特性做出最优选择。(这段存疑,需查看widows display driver model)
"Why are cards so bad at shader changes? It's because many of them can only be running one set of shaders and shader constants at a time. Remember that video cards are not like a CPU - they are a very long pipeline - it may take hundreds of thousands of clock cycles for a triangle to get from being given to the card to being rendered. GPUs are so fast because they can have huge numbers of triangles and pixels being processed at once, so although they have very poor "latency" (the time taken from an input to go all the way through to the end and finish processing), they have extremely impressive "throughput" (how many things you can finish processing in a second).
So when you change shader, many cards have to wait for their entire pipeline to drain fully empty, then upload the new shader or set of shader constants into a completely idle pipeline, and then start feeding work into the start of the pipeline again. This draining is often called a pipeline "bubble" (though it's a slightly inaccurate term from the point of view of a hardware engineer), and in general you want to avoid them."
CPU发送命令给GPU,GPU是异步执行的,所以可能不至于等待整个pipeLine的flush,同样这段存疑,需查看widows display driver model