Unified Occlusion Culling: Portals, Visibility Umbra, and HZB
In any realistic 3D scene, there are things you can see and things that you can't see. It sounds simple, but the application of this knowledge can have very beneficial effects on the performance of an application billed with rendering the 3D scene in real time. Generally, we want to lower the computational cost of objects that do not contribute to the scene - those objects that are outside our field of view or are obscured by other objects. The Z-buffer is the last stage in the rendering pipeline where an object can be identified as "invisible"; in this case, the object in question is a fragment, and it can only be identified as invisible when the fragment's depth value is "deeper" than something that has already been rendered at the same raster position. The Z-buffer test is invoked for each and every fragment of a higher-level object being rendered - the polygonal representation of a "thing" in the scene, such as a person, a wall, a car, etc - and thus imposes a rather high computational cost of visibility determination, even with recent optimizations made to hardware z-buffer mechanisms.
Figure 1: A complex scene making use of occlusion culling optimizations. (Image from Yann L)
We would prefer to exercise the ability to decide whether an object will be invisible well before invoking the cost of a per-fragment test on the entire polygonal representation of the object. Mechanisms for doing so are referred to as "culling mechanisms" because of their nature to select only certain objects from a larger set (flock) of objects. Some well-known culling mechanisms include backface culling, which prevents the rasterization of any fragments on a polygon known to be "facing away" from the viewing position, and frustum culling, which prevents the consideration of polygons belonging to objects that are entirely outside the viewing frustum (field of view). Hierarchical spatial trees are also employed to further optimize the decision of whether objects are outside the viewing frustum or not. In sparse scenes, such as those found in space-flight simulators, those two mechanisms alone are usually sufficient to improve the rendering performance of an application. Many scenes are not sparse though, and when using a more down-to-earth point of view in the more dense scenes, we tend to get a lot more needless overdraw when simply culling to the view frustum. Perspective projection places a great emphasis on the screen size of objects near to the view position, such that they collectively obscure large numbers of farther-away objects; we would like to figure out which far-away objects are completely hidden, and thus avoid bogging down our rendering pipeline with them. The mechanism for performing this operation is known as "occlusion culling", and has been the focus of many research papers. Within, I propose a system of combining several effective occlusion mechanisms for optimal performance in a variety of different scene types, with a potential speedup factor of several hundred percent in some scenes. The specific occlusion mechanisms involved include: portal systems, visibility umbra, and hierarchical Z-buffers.
The Culling SystemsPortals: A portal system divides the empty space of a scene into "sectors" and makes connections between them using "portals", creating a graph of spaces (nodes) connected by portals (edges). The spaces are generally rooms and corridors, and the portals are windows and doorways, atlhough in dense portal systems such as those generated from BSP, the portals are much more profuse. The process of rendering a portalized set starts with the sector that the view position is in, and then tests whether the view frustum contains any portals in the current sector. When a portal is found within the frustum, the frustum is clipped to the extents of that portal, and the process continues in a depth-first recursive manner into the sector connected through that portal, and stops the recursion when all visible portals in a sector have been traversed. This allows the scene traversal to render only the objects that are visible through holes in the scene geometry, implicitly ignoring any geometry hidden behind solid chunks of the scene. We don't require a perfect clipping operation for our portal-constrained objects, so a per-object flag can be used to identify those objects that have already been rendered in the current traversal. Portal occlusion is useful when you have tightly enclosed scenes with lots of potential overdraw and full of holes, such as building interiors.
Figure 2a: Portalized scene with clipped frusta shown.
Figure 2b: Graph representation of portalized scene. Lines indicate portal connections.
Modern 3D engines have been using portals in their scene representations for quite some time. Doom and Build (Duke Nukem 3D, Shadow Warrior) used explicit 2D sectors and portals, with heights applied to the floors and ceilings to give a more 3D effect. Quake used CSG to construct levels, and in the process of compiling a static PVS for each sector, actually compiled a BSP and calculated the PVS using the portals generated from the BSP. Descent had a scene composed exclusively of cuboidal sectors connected together with portals situated at each face of the cuboids, giving one of the first true 3D portal engines. Jedi Knight was also an explicit sector and portal engine, with arbitrary 3D sectors and portal placement allowing much more realistic environments. Unreal also made use of portals, both manually placed and generated from BSP compilation. Most of these portal representations used very densely concentrated portals, generating them at every convex edge in a scene. This of course becomes very prohibitive as the number of polygons (and thus edges) increases, and so more recent engines have encouraged manually-placed portals or generated portals from a simplified set of scene geometry.
Visibility Umbra: This is a method of extruding a frustum of the "hidden space" behind the solid convex portion of an object, and is used for testing whether whole objects are inside or outside the extruded frustum. The frustum extrusion follows the same mechanism as that used for generating stencil shadow volumes, although with a few more rules: the silhouette edges of the object's occlusion mesh are determined, and planes are extruded away from the camera position; the occlusion mesh must be convex to give a frustum composed of a convex set of planes. This method is less common in modern games than portals, but some engine developers have cited use of this mechanism, with one prior publication on the topic available on gamasutra.com [Bacik]. This occlusion mechanism is useful when the scene has large and/or mostly-convex objects that tend to obscure each other and other objects; Bacik makes use of it for having smoothly rolling hills occlude other hills and valleys in outdoors landscape scenes. The focus on large objects is due to the inability of the frusta of multiple objects to combine into a single testable volume, making it difficult for a large number of small objects to effectively occlude anything. A fused volume could be constructed using BSP methods, but I have not seen much prior research on that topic, and the cost of performing a BSP compilation each frame versus the savings of not rendering objects that are only hidden by combinations of occluders is hard to determine, as one is O(n^2) and the other is O(k), where n and k are isolated variables. Also of note is the fact that this is a geometric approach, and thus can be quite precise, assuming the occlusion mesh being used is an accurate representation of the visible geometry.
Figure 3: An Occlusion frustum built from an occluder, using current viewer position. (Image from [Bacik])
HZB: A hierarchical Z-buffer (HZB) is a method of rasterizing certain portions of a scene into a depth buffer and then performing image-space occlusion queries on a hierarchical structure constructed from the depth buffer. The hierarchical structure contains conservative depth information, in a pyramidal structure similar to a mipmap chain, and allows very fast queries to be performed on many objects in a scene. The conservative depth information is the minimum and/or maximum depth values downsampled in a 4:1 manner, like with mipmaps. The most simple HZB only stores maximum depth information at each sample position, which allows rejection (definitely invisible) at low levels of the structure; you can also store minimum depth values to accept objects (definitely visible). Generally only objects determined or known to be "good occluders" are rasterized into the root-level depth buffer, and coarse representations of other objects are tested against the hierarchical structure. These coarse representations can be hierarchical spatial models, like those used in frustum culling, to further optimize the culling mechanism by considering a large group of objects with a single test. Due to the high cost of the hierarchy generation, all occluders must be selected and rasterized before visibility testing can be performed; this can be problematic if near and far objects in the scene provide occlusion, and thus the selection of occluders is a matter that benefits from tweaking for specific scenes or scene types.
Figure 4: A simple HZB construction with a 4x4 base depth buffer.
Now they've all been introduced, let's make some comparisons. The HZB and visibility umbra methods are similar in that they find spaces that are definitely invisible, as opposed to the portal method which finds spaces that are definitely visible - I will refer to the portal method as "negative occlusion" and the others as "positive occlusion". Both positive occlusion methods require some metric for determining "good occluders" from the set of objects immediately visible in a scene - some objects might always be good occluders, some objects might be good occluders only when they occupy some certain screen space, and some might just not be useful for occlusion. Some examples of the three types: walls & doors, rocks, shrubbery. The primary advantage that HZB method has over visibility umbra is the inherent ability to fuse distinct occluders into a single useful occlusion structure, whereas the visibility umbra of several objects are more difficult to fuse together. HZB is a fairly high-cost computation to perform, due to its per-fragment nature and per-frame hierarchical reconstruction, and thus should be used mainly in scenes where portals and visibility umbra tend to fail. Such HZB-friendly scenes include a dense forest where many relatively thin tree trunks combine to form large swaths of blocked visibility, or a city scene where skyscrapers form a forest of buildings in much the same manner. It is important to note that the HZB is implemented on the CPU, so that high-level scene visibility can run in parallel with the fine-grain Z-buffering done by video hardware, without needing to synchronize the two processors. The resolution of the root-level depth buffer in a HZB does not need to equal the size of the viewport used on the video hardware, and a generally useful resolution is around 128x128 to 256x256 pixels. In scenes where HZB is unnecessary, it will simply slow things down by tying up the CPU with rasterization.
Unification of the culling mechanisms
None of these methods is a silver bullet for solving your occlusion culling issues, but when combined they can be quite effective.
To Be Continued ...
Various responses by Yann Lombard in threads on gamedev.net
Rendering the Great Outdoors: Fast Occlusion Culling for Outdoor Environments by Michal Bacik. July 17, 2002.