# brief
- 早就知道的,没啥新东西,算复习了
# algorithm overview
- scene object分成occluder和occludee
- 遍历occluder
- to screen space
- bin triangles into tiles
- rasterize triangles to depth buffer
- 遍历 occludees
- 简化处理,使用AABB, transform 7 vertices into screen space
- 为啥不用gpu generated depth buffer
- latency
- copy的cost
- artifact,还是因为latency
# depth rasterization
- triangle bin到tile
- 使用三角形的screen space bound来找需要bin的tiles
- 三角形小还好,如果是个大长条对角线就会不好(他们没试图解决这问题)
- pixel traversal(rasterization)
- boundingbox traversal(遍历三角形boundingbox中的每个pixel,不在三角形里的跳过)
- move in 2x2 blocks for sse
- line equation and top-left rule
- 优化地计算line equation,对(x,y)算过f(x,y)后,再算f(x+1,y)和f(x,y+1)还有f(x+1,y+1)只需要在f(x,y)的基础上加一些就好
- triangle-area计算,用来判断triangle朝向
- depth computation using barycentric coordinates
- barycentric weight其实就是点和某个边形成的三角形的面积
- clipper
- 要把nearplane里的也clip掉,用homogeneous坐标
# optimization
- binning(为了能tile间并行、局部性)
- frustum culling(一般来说轮不着这里做)
- vectorization with SSE(所有能4个同时做的地方都做、4像素、4vertices)
- multithreading
- pipelining(用上一帧的depth buffer),但是其实没啥用,因为都是用的CPU资源(如果GPU是瓶颈、CPU会空闲的话那是真有用)
- occludee和occluder size threshold
- 太小的occludee和occluder不处理
# QA
- 为啥不对比下hardware occlusion query:首先hardware oc实在是不太好做。其次是umbra已经在最新的version中abandon了gpu oq(个人认为是gpu的不同driver、API情况太多、维护和optimization都太麻烦)
- proxy model for rasterization:问过game developer,如果要额外资源的话就没人用了。但是能有肯定最好
- modern PC gpu处理drawcall其实没啥问题,问题是submit drawcall。你在cpu上rasterize也是增加cpu消耗
- 早就知道的,没啥新东西,算复习了
# algorithm overview
- scene object分成occluder和occludee
- 遍历occluder
- to screen space
- bin triangles into tiles
- rasterize triangles to depth buffer
- 遍历 occludees
- 简化处理,使用AABB, transform 7 vertices into screen space
- 为啥不用gpu generated depth buffer
- latency
- copy的cost
- artifact,还是因为latency
# depth rasterization
- triangle bin到tile
- 使用三角形的screen space bound来找需要bin的tiles
- 三角形小还好,如果是个大长条对角线就会不好(他们没试图解决这问题)
- pixel traversal(rasterization)
- boundingbox traversal(遍历三角形boundingbox中的每个pixel,不在三角形里的跳过)
- move in 2x2 blocks for sse
- line equation and top-left rule
- 优化地计算line equation,对(x,y)算过f(x,y)后,再算f(x+1,y)和f(x,y+1)还有f(x+1,y+1)只需要在f(x,y)的基础上加一些就好
- triangle-area计算,用来判断triangle朝向
- depth computation using barycentric coordinates
- barycentric weight其实就是点和某个边形成的三角形的面积
- clipper
- 要把nearplane里的也clip掉,用homogeneous坐标
# optimization
- binning(为了能tile间并行、局部性)
- frustum culling(一般来说轮不着这里做)
- vectorization with SSE(所有能4个同时做的地方都做、4像素、4vertices)
- multithreading
- pipelining(用上一帧的depth buffer),但是其实没啥用,因为都是用的CPU资源(如果GPU是瓶颈、CPU会空闲的话那是真有用)
- occludee和occluder size threshold
- 太小的occludee和occluder不处理
# QA
- 为啥不对比下hardware occlusion query:首先hardware oc实在是不太好做。其次是umbra已经在最新的version中abandon了gpu oq(个人认为是gpu的不同driver、API情况太多、维护和optimization都太麻烦)
- proxy model for rasterization:问过game developer,如果要额外资源的话就没人用了。但是能有肯定最好
- modern PC gpu处理drawcall其实没啥问题,问题是submit drawcall。你在cpu上rasterize也是增加cpu消耗