Directx Performance Considerations

zzisong

于 2007-01-10 12:23:00 发布

阅读量424

点赞数

分类专栏：游戏开发文章标签： performance textures locking buffer combinations initialization

本文链接：https://blog.csdn.net/zzisong/article/details/1478958

版权

游戏开发专栏收录该内容

2 篇文章 0 订阅

订阅专栏

Performance Considerations
One of the points of knowing all this background is to understand that the pipeline
is full of bottlenecks waiting to happen. A chain is only as strong as its weakest link,and every stage in the pipeline is a potential weak link. Looking back, before 3D
cards, video cards were mostly measured by their fill rate. Fill rate is basically the
measure of how fast the card can paint pixels on the screen. When cards were slower
than the CPU feeding them, applications were “fill rate limited.” When fast 2D cards
became available, the CPUs could not process the geometry fast enough, and the
applications could be “geometry limited.” Here are a couple things to watch out for:
■ Be aware of how often you are manipulating data on the CPU. It doesn’t
matter how fast your CPU or GPU is if the path between them is slow to
traverse. The system bus is very slow compared to the data paths through the
card itself.
■ Be aware of how much geometry is actually being pumped through the
geometry pipeline, including geometry that you don’t necessarily see.
Because of clipping, you might see only a little simple geometry on the
screen, but there might be a lot running through your shader.
■ Be aware of your texture size. Cards are increasingly limited by their ability to
move textures from memory to the screen. The processors are fast enough
that the speed of the memory is actually the bottleneck! Don’t use enormous
textures if you don’t have to.
■ Be aware of what your texture blending is doing. Make sure you’re not doing
more than you have to.
■ Be aware of which tests you have enabled. If you don’t need the depth test,
make sure it is disabled.
I go into these and many other performance considerations in the following
chapters. The performance point to remember here is that you need to keep in
mind each phase of the pipeline has its own performance characteristics and
considerations. If your card says that it can render 100 million polygons, but you are
only getting 10 million, you may be limited by something you’re doing with
textures. Knowing how the pipeline works and how hard your application hits each
part will help you sort out performance problems.

Performance Considerations
Before I say anything about performance, remember that performance is ultimately
a function of what exactly you’re trying to do. Therefore, these are some rules of
thumb, but you should always experiment to find the best approach for you. The
following is a laundry list of things to consider:
■ In general, locking the vertex buffer is a costly operation and should be
avoided when possible. Often a lock is necessary, but I’ve heard horror stories
of people locking, setting three vertices, rendering, and repeating. Instead,
add many more vertices in a single lock.
■ Be very aware of the flags used when creating and locking the vertex buffer.
Different flag or flag combinations could make measurable performance
differences. When in doubt, experiment until you find the best setup.
■ Usually, the more vertices you can send to the card in a single call, the better.
The numbers differ for different cards and different applications, but a
general rule of thumb is that the number should be in the thousands, not in
the tens of vertices.
■ Calling SetVertexShader is a costly operation. It forces the device to stop and
prepare for the new type of data. If you’re dealing with different FVFs, group
your geometry by FVF to avoid switching back and forth. Some people even
go so far as to render every other frame in different orders to avoid
unnecessary switches!
■ Because switching formats is costly, create formats based on attributes, not on
objects. Therefore, don’t create one format for cars and another format for
airplanes if they share the same attributes. Instead, create one format for
single textured vertices, another for multitextured vertices, and so on, and
use the correct format for all appropriate objects.
■ Setting the stream source is also a costly operation. Again, rendering calls
should be batched together to take the most advantage of a single vertex
buffer before switching.
■ As stated earlier, the more geometry you can process in a single call, the
better. This applies to DrawPrimitive as well. In some cases, you might not be
able to send thousands of vertices at the same time, but in general, do as
much as you can.

Most of the rules of thumb can be summed up by the frequently heard rule
“Batching is everything.” Always maximize the amount of work you can do in a
single call, and minimize the number of times you force the device to switch to a
new state. In my work, I almost never lock the vertex buffer (except during
initialization), and I minimize how often I reset anything. If you are frequently
locking the buffer or making format changes, take a good long look and ask
yourself whether you really need to do that. Chances are pretty good that you’ll find
many ways to optimize your code. But remember, the best way to optimize is by
experimenting and finding the best approach for the given task.

Performance Considerations
There aren’t many performance considerations to worry about when setting
transforms. Of course, there is a slight cost associated with setting transforms and
computing matrices, but it’s minimal. In general, batching is always a good thing,
but don’t worry too much about the cost of SetTransform. In fact, a call to
SetTransform is probably cheaper than locking the vertices and transforming them
on the CPU. In most cases, it is best to take advantage of the hardware T&L and use
matrices to transform data. It is much cheaper to send 16 floating-point values to
the card than it is to send several thousand new vertices.

Performance Considerations
I’ve hinted at several performance considerations throughout this chapter, but it is
worthwhile to reiterate some of them here. First, as always, it’s best to keep the
number of vertices as low as possible. This is a matter of not only dealing with
optimized 3D assets, but also rendering those assets as efficiently as possible. As I’ve
shown, the best way to render geometry is with either indexed strips or indexed lists.
Indexed lists might be slightly less efficient, but overall they are easier to work with.
Batching is always something to keep in mind. The issues with meshes and searches
through the attribute buffer are not only a matter of the searches through memory. It
is also a good idea to optimize the mesh to limit the number of times you set materials
and textures. I have heard anecdotes about developers who render their objects
differently for even and odd frames to take advantage of device states that were set on
the previous frame. This is probably not an extreme you need to go to, but it shows
the importance of thinking about how you use the vertices and the device.
Finally, in this chapter, you’ve started to use lighting and set different render states
such as lighting and culling. I discuss these features in their own chapter soon, but
it is sometimes easy to forget that these things are enabled, costing resources. If you

are not using something like lighting, remember to turn it off. But first, remember
the rule about batching. Do not turn the lighting on and off. Render all lit objects
first with lighting on, and then turn it off for all the objects that do not use lighting.

Performance Considerations
Textures take up a lot of memory. That affects not only storage space, but also
rendering time because all of that memory must be pushed through the pipeline as
objects are rendered. As I mentioned before, if there are ways to get by with smaller
textures, do so. If a texture has a lot of repetition, create a smaller version of that
texture and tile it using texture coordinates greater than 1.0.
On the flipside, if you have a lot of very small textures, you might want to place
them all into one medium-sized texture and use texture coordinates to index into
the different regions. This is the idea behind the text-rendering functions I talk
about in later chapters. The downside of this is that it does not allow you to tile the
subtextures in many cases, so sometimes you just need to experiment to see what’s
best for your application.
You also don’t want to call SetTexture any more than you have to because setting the
texture is expensive. The previous code snippet demonstrates the basic usage of
SetTexture, but take it with a grain of salt. You do not need to set all the textures to
NULL at the end of every frame. That can be expensive, and it wouldn’t really
accomplish anything. Only set textures when you need to, and batch textured
objects together as much as possible.
Performance, as it relates to textures, is heavily dependent on your hardware and
your application. Whenever possible, I give more performance tips, but in reality
you should understand some of these basic tips and then experiment to see what
works best for your specific situation.

Performance Considerations
Any of these tests takes time. If you are doing something extremely simple and you
don’t need depth testing, turn it off. If you don’t have any transparency, do not
enable alpha blending. Also, if you have only a few transparent objects, turn
blending on for those objects but off for everything else.
The alpha test, on the other hand, can be your friend. Performing the alpha test is
much cheaper than alpha blending, so the more pixels you can discard with the
test, the better off you’ll be. Of course, if you know that everything will pass the test
because you have no transparent objects, make sure you disable the test.
The performance usage of these tests is mostly common sense. Remember that
these are simple operations, but they are occurring for each and every pixel, usually
several times per pixel in complex scenes. Even when running at 640x480, this can
mean a couple million tests per frame. In cases where they help you, turn them on.
Turn them off when they don’t help you.

Performance Considerations
As you will see later, mesh cloning involves a decent amount of code that involves
locking and unlocking buffers and moving a lot of data around. It’s a fairly
expensive operation, but it’s difficult to think of any time that you would need to do
this while you’re actually rendering. You usually do this during initialization.

The one thing to consider here is memory usage. Remember that a cloned mesh is a
new mesh. If you clone and then do not release the old mesh, that mesh occupies
memory. It could also possibly be in video memory, which is more precious. Also,
mesh-specific buffers such as the attribute buffers don’t really do anything for you, so
you might consider getting rid of the mesh altogether. The one thing to remember is
that you are creating new objects. Keep an eye on what is actually being used.

Performance Considerations
If you apply these techniques to a 3D scene, the performance hit should be roughly
equivalent to the performance hit of Chapter 33. The pixel shaders might be more
complicated here, but these techniques usually do not need to enable alpha
blending (unless you are also implementing motion blur as in Chapter 33). I admit
that there still might not be enough spare horsepower to justify these techniques,
but they might be worth it in some cases. They might also allow you to optimize
other areas such as lighting if you know that the post-processing technique will
obscure some effects.
Of course, the performance hit will be very high if you try to run a version 1.4
shader on hardware that does not support it. At the moment, many of these
techniques do not enjoy ubiquitous support, but they will in the future.
One final performance point to remember: This technique is independent of
scene complexity. The technique “costs” the same regardless of how many polygons
you have.

Performance Issues
I like the per-pixel technique, but it does have a downside in terms of performance.
Reading a render target is still a fairly expensive operation, but the effect isn’t too
bad if you’re only responding to the occasional click. At the moment, reading the
render target is one performance-sensitive operation that can’t really be avoided.
The good news is that this cost is constant no matter how complex the scene is.
The other potential performance hit comes from the fact that you need to render
the scene in an extra pass, but this might not be as costly as you think. First of all,
the picking pass should disable textures, lighting, pixel shaders, and all other forms
of color processing. This should account for a lot of savings in terms of time spent
rendering the pick pass. You can also avoid rendering any object that cannot be
picked. For instance, if the user can only pick other players, you might be able to
get by rendering only the player models. However, if the players can be behind
walls, you must render those walls as well to get the proper occlusion of players. You
can definitely avoid rendering lens flares, sky boxes, and so on.

So, two distinct factors affect performance. In this simple example, the picking
operation cuts the frame rate approximately in half, but the majority of the cost
comes from the fact that you are rendering two nearly identical passes. In more
realistic, complex scenes, the pick pass might be much less complex than the visible
pass, meaning that the proportional performance hit is not as large. This technique
bears experimentation, but the point is that you should not be discouraged by the
performance specs from this simple application. A 50 percent hit in this example
does not necessarily mean that you will get a 50 percent hit in real applications. As
always, you should experiment.

zzisong

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Directx Performance Considerations

Performance ConsiderationsOne of the points of knowing all this background is to understand that the pipelineis full of bottlenecks waiting to happen. A chain is only as strong as its weakest link,and
复制链接

扫一扫

专栏目录