Directx Performance Considerations

Performance Considerations
One of the points of knowing all this background is to understand that the pipeline
is full of bottlenecks waiting to happen. A chain is only as strong as its weakest link,and every stage in the pipeline is a potential weak link. Looking back, before 3D
cards, video cards were mostly measured by their fill rate. Fill rate is basically the
measure of how fast the card can paint pixels on the screen. When cards were slower
than the CPU feeding them, applications were “fill rate limited.” When fast 2D cards
became available, the CPUs could not process the geometry fast enough, and the
applications could be “geometry limited.” Here are a couple things to watch out for:
■ Be aware of how often you are manipulating data on the CPU. It doesn’t
matter how fast your CPU or GPU is if the path between them is slow to
traverse. The system bus is very slow compared to the data paths through the
card itself.
■ Be aware of how much geometry is actually being pumped through the
geometry pipeline, including geometry that you don’t necessarily see.
Because of clipping, you might see only a little simple geometry on the
screen, but there might be a lot running through your shader.
■ Be aware of your texture size. Cards are increasingly limited by their ability to
move textures from memory to the screen. The processors are fast enough
that the speed of the memory is actually the bottleneck! Don’t use enormous
textures if you don’t have to.
■ Be aware of what your texture blending is doing. Make sure you’re not doing
more than you have to.
■ Be aware of which tests you have enabled. If you don’t need the depth test,
make sure it is disabled.
I go into these and many other performance considerations in the following
chapters. The performance point to remember here is that you need to keep in
mind each phase of the pipeline has its own performance characteristics and
considerations. If your card says that it can render 100 million polygons, but you are
only getting 10 million, you may be limited by something you’re doing with
textures. Knowing how the pipeline works and how hard your application hits each
part will help you sort out performance problems. 

Performance Considerations
Before I say anything about performance, remember that performance is ultimately
a function of what exactly you’re trying to do. Therefore, these are some rules of
thumb, but you should always experiment to find the best approach for you. The
following is a laundry list of things to consider:
■ In general, locking the vertex buffer is a costly operation and should be
avoided when possible. Often a lock is necessary, but I’ve heard horror stories
of people locking, setting three vertices, rendering, and repeating. Instead,
add many more vertices in a single lock.
■ Be very aware of the flags used when creating and locking the vertex buffer.
Different flag or flag combinations could make measurable performance
differences. When in doubt, experiment until you find the best setup.
■ Usually, the more vertices you can send to the card in a single call, the better.
The numbers differ for different cards and different applications, but a
general rule of thumb is that the number should be in the thousands, not in
the tens of vertices.
■ Calling SetVertexShader is a costly operation. It forces the device to stop and
prepare for the new type of data. If you’re dealing with different FVFs, group
your geometry by FVF to avoid switching back and forth. Some people even
go so far as to render every other frame in different orders to avoid
unnecessary switches!
■ Because switching formats is costly, create formats based on attributes, not on
objects. Therefore, don’t create one format for cars and another format for
airplanes if they share the same attributes. Instead, create one format for
single textured vertices, another for multitextured vertices, and so on, and
use the correct format for all appropriate objects.
■ Setting the stream source is also a costly operation. Again, rendering calls
should be batched together to take the most advantage of a single vertex
buffer before switching.
■ As stated earlier, the more geometry you can process in a single call, the
better. This applies to DrawPrimitive as well. In some cases, you might not be
able to send thousands of vertices at the same time, but in general, do as
much as you can.

Most of the rules of thumb can be summed up by the frequently heard rule
“Batching is everything.” Always maximize the amount of work you can do in a
single call, and minimize the number of times you force the device to switch to a
new state. In my work, I almost never lock the vertex buffer (except during
initialization), and I minimize how often I reset anything. If you are frequently
locking the buffer or making format changes, take a good long look and ask
yourself whether you really need to do that. Chances are pretty good that you’ll find
many ways to optimize your code. But remember, the best way to optimize is by
experimenting and finding the best approach for the given task.

 

Performance Considerations
There aren’t many performance considerations to worry about when setting
transforms. Of course, there is a slight cost associated with setting transforms and
computing matrices, but it’s minimal. In general, batching is always a good thing,
but don’t worry too much about the cost of SetTransform. In fact, a call to
SetTransform is probably cheaper than locking the vertices and transforming them
on the CPU. In most cases, it is best to take advantage of the hardware T&L and use
matrices to transform data. It is much cheaper to send 16 floating-point values to
the card than it is to send several thousand new vertices.

Performance Considerations
I’ve hinted at several performance considerations throughout this chapter, but it is
worthwhile to reiterate some of them here. First, as always, it’s best to keep the
number of vertices as low as possible. This is a matter of not only dealing with
optimized 3D assets, but also rendering those assets as efficiently as possible. As I’ve
shown, the best way to render geometry is with either indexed strips or indexed lists.
Indexed lists might be slightly less efficient, but overall they are easier to work with.
Batching is always something to keep in mind. The issues with meshes and searches
through the attribute buffer are not only a matter of the searches through memory. It
is also a good idea to optimize the mesh to limit the number of times you set materials
and textures. I have heard anecdotes about developers who render their objects
differently for even and odd frames to take advantage of device states that were set on
the previous frame. This is probably not an extreme you need to go to, but it shows
the importance of thinking about how you use the vertices and the device.
Finally, in this chapter, you’ve started to use lighting and set different render states
such as lighting and culling. I discuss these features in their own chapter soon, but
it is sometimes easy to forget that these things are enabled, costing resources. If you

are not using something like lighting, remember to turn it off. But first, remember
the rule about batching. Do not turn the lighting on and off. Render all lit objects
first with lighting on, and then turn it off for all the objects that do not use lighting.

 

Performance Considerations
Textures take up a lot of memory. That affects not only storage space, but also
rendering time because all of that memory must be pushed through the pipeline as
objects are rendered. As I mentioned before, if there are ways to get by with smaller
textures, do so. If a texture has a lot of repetition, create a smaller version of that
texture and tile it using texture coordinates greater than 1.0.
On the flipside, if you have a lot of very small textures, you might want to place
them all into one medium-sized texture and use texture coordinates to index into
the different regions. This is the idea behind the text-rendering functions I talk
about in later chapters. The downside of this is that it does not allow you to tile the
subtextures in many cases, so sometimes you just need to experiment to see what’s
best for your application.
You also don’t want to call SetTexture any more than you have to because setting the
texture is expensive. The previous code snippet demonstrates the basic usage of
SetTexture, but take it with a grain of salt. You do not need to set all the textures to
NULL at the end of every frame. That can be expensive, and it wouldn’t really
accomplish anything. Only set textures when you need to, and batch textured
objects together as much as possible.
Performance, as it relates to textures, is heavily dependent on your hardware and
your application. Whenever possible, I give more performance tips, but in reality
you should understand some of these basic tips and then experiment to see what
works best for your specific situation.

 

Performance Considerations
Any of these tests takes time. If you are doing something extremely simple and you
don’t need depth testing, turn it off. If you don’t have any transparency, do not
enable alpha blending. Also, if you have only a few transparent objects, turn
blending on for those objects but off for everything else.
The alpha test, on the other hand, can be your friend. Performing the alpha test is
much cheaper than alpha blending, so the more pixels you can discard with the
test, the better off you’ll be. Of course, if you know that everything will pass the test
because you have no transparent objects, make sure you disable the test.
The performance usage of these tests is mostly common sense. Remember that
these are simple operations, but they are occurring for each and every pixel, usually
several times per pixel in complex scenes. Even when running at 640x480, this can
mean a couple million tests per frame. In cases where they help you, turn them on.
Turn them off when they don’t help you.

 

Performance Considerations
As you will see later, mesh cloning involves a decent amount of code that involves
locking and unlocking buffers and moving a lot of data around. It’s a fairly
expensive operation, but it’s difficult to think of any time that you would need to do
this while you’re actually rendering. You usually do this during initialization.

The one thing to consider here is memory usage. Remember that a cloned mesh is a
new mesh. If you clone and then do not release the old mesh, that mesh occupies
memory. It could also possibly be in video memory, which is more precious. Also,
mesh-specific buffers such as the attribute buffers don’t really do anything for you, so
you might consider getting rid of the mesh altogether. The one thing to remember is
that you are creating new objects. Keep an eye on what is actually being used.

 

Performance Considerations
If you apply these techniques to a 3D scene, the performance hit should be roughly
equivalent to the performance hit of Chapter 33. The pixel shaders might be more
complicated here, but these techniques usually do not need to enable alpha
blending (unless you are also implementing motion blur as in Chapter 33). I admit
that there still might not be enough spare horsepower to justify these techniques,
but they might be worth it in some cases. They might also allow you to optimize
other areas such as lighting if you know that the post-processing technique will
obscure some effects.
Of course, the performance hit will be very high if you try to run a version 1.4
shader on hardware that does not support it. At the moment, many of these
techniques do not enjoy ubiquitous support, but they will in the future.
One final performance point to remember: This technique is independent of
scene complexity. The technique “costs” the same regardless of how many polygons
you have.

 

Performance Issues
I like the per-pixel technique, but it does have a downside in terms of performance.
Reading a render target is still a fairly expensive operation, but the effect isn’t too
bad if you’re only responding to the occasional click. At the moment, reading the
render target is one performance-sensitive operation that can’t really be avoided.
The good news is that this cost is constant no matter how complex the scene is.
The other potential performance hit comes from the fact that you need to render
the scene in an extra pass, but this might not be as costly as you think. First of all,
the picking pass should disable textures, lighting, pixel shaders, and all other forms
of color processing. This should account for a lot of savings in terms of time spent
rendering the pick pass. You can also avoid rendering any object that cannot be
picked. For instance, if the user can only pick other players, you might be able to
get by rendering only the player models. However, if the players can be behind
walls, you must render those walls as well to get the proper occlusion of players. You
can definitely avoid rendering lens flares, sky boxes, and so on.

So, two distinct factors affect performance. In this simple example, the picking
operation cuts the frame rate approximately in half, but the majority of the cost
comes from the fact that you are rendering two nearly identical passes. In more
realistic, complex scenes, the pick pass might be much less complex than the visible
pass, meaning that the proportional performance hit is not as large. This technique
bears experimentation, but the point is that you should not be discouraged by the
performance specs from this simple application. A 50 percent hit in this example
does not necessarily mean that you will get a 50 percent hit in real applications. As
always, you should experiment.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
### 回答1: PCI Express(简称PCIe)架构是一种用于连接计算机内置设备的高速串行总线标准。在进行PCIe的物理电测试时,需要考虑以下几个方面。 首先,传输速率是PCIe物理电测试的一个重要考虑因素。PCIe标准定义了不同的传输速率,如2.5Gbps、5Gbps和8Gbps等。测试时需要确保信号能以指定的速率进行传输,而不产生数据错误或错误的传输速率。 其次,信号完整性是另一个需要考虑的因素。PCIe信号在传输过程中容易受到干扰,如时钟抖动、串扰和噪声等。在测试中需要检测和分析信号的完整性,确保它们能稳定地传输而不受到干扰。 第三,功耗和电源管理也是PCIe物理电测试中需要关注的方面。PCIe设备在工作期间会产生热量,而不正确的功耗管理可能导致设备过热或不稳定。测试时需要测量设备的功耗和电源管理功能,以确保其在不同工作负载下能够正常运行。 此外,电缆和连接器的质量也是物理电测试中需要考虑的因素之一。PCIe连接器的接触质量和电缆的传输性能直接影响了整个系统的稳定性和可靠性。测试时需要检查连接器和电缆的连接状态,并测试其传输性能以确保其质量。 最后,抗干扰能力也是PCIe物理电测试中一个重要的考虑因素。PCIe设备在实际使用中可能会受到其他电子设备或无线信号的干扰。因此,在测试中需要评估设备的抗干扰能力,确保其在干扰环境下仍能稳定工作。 综上所述,PCIe物理电测试需要考虑传输速率、信号完整性、功耗和电源管理、连接器质量以及抗干扰能力等因素。通过对这些方面的测试和评估,可以确保PCIe设备的可靠性、性能和稳定性。 ### 回答2: PCI Express(PCIe)架构是一种高速串行数据总线标准,用于计算机系统中的外部设备连接。在测试PCIe架构时,需要考虑一些物理电测试的因素。 首先,应该确保传输介质的质量和可靠性。PCIe使用不同版本的传输介质,如铜线缆或光纤。测试人员需要检查传输介质的物理状态,确保其没有损坏或其他问题,以确保数据传输的稳定性和可靠性。 其次,还需考虑电气特性。PCIe架构需要满足特定的电气规范,以确保信号的正确传输和接收。测试时应检查电压、电流和信号波形等电气特性是否符合规范。这包括测试电源电压稳定性、消耗功率以及时钟信号等。 第三,信号完整性也是一个重要的考虑因素。PCIe使用差分信号传输,在测试过程中需要检查信号的完整性。差分信号测试涉及测试信号的振幅、上升时间、下降时间和噪声等。通过测试信号完整性,可以确保数据的正确传输,减少丢失和误码等问题。 此外,还应考虑外部干扰的影响。外部干扰可能会导致信号质量下降,从而影响PCIe架构的性能。测试人员需要测试环境中的干扰情况,以及采取适当的措施来减少干扰对PCIe架构的影响,例如使用屏蔽措施或进行地线和屏蔽的良好连接。 总之,测试PCIe架构时需要考虑传输介质的质量和可靠性、电气特性、信号完整性以及外部干扰等因素。通过进行全面的物理电测试,可以确保PCIe架构在实际应用中的稳定性和性能。 ### 回答3: PCI Express(以下简称PCIe)架构是一种计算机总线标准,用于在计算机内部连接外部设备和其他组件。进行PCIe物理电测试时需要考虑以下几个方面。 首先,受电控制和供电电机应该被充分考虑。在进行物理电测试之前,需要确保PCIe接口和其他相关设备的电源系统能够稳定地工作,并能提供足够的电流和电压。这需要进行电感、阻抗和稳压控制等方面的测试,以确保电源系统的稳定性和可靠性。 其次,信号完整性也是物理电测试的重要考虑因素之一。信号完整性是指在信号从发送器到接收器过程中,保持信号的正确性和可靠性。为了确保信号的完整性,需要对信号传输线路的电阻、电容和传输延迟进行测试。此外,还需要检查信号的波形、噪音和抖动等参数,以评估信号的质量和稳定性。 另外,电磁兼容性(EMC)也是PCIe物理电测试的重要考虑因素之一。EMC测试是为了确保PCIe接口和其他设备在电磁环境中能够正常工作,且不会对其他设备产生干扰。在EMC测试中,需要检查设备的辐射和敏感度,以及接地和屏蔽的效果,以确保PCIe架构在各种电磁环境下都能保持稳定的工作状态。 最后,还需要考虑PCIe接口的机械强度和可靠性。这包括对接口连接器的插拔次数、连接器的保持力和连接器接触性能等方面进行测试,以确保PCIe接口在长时间使用中能够保持稳定的连接和可靠的传输性能。 综上所述,PCIe物理电测试涉及到电源控制、供电电机、信号完整性、电磁兼容性和机械强度等多个方面的考虑因素。通过对这些方面的测试和评估,可以确保PCIe架构在各种环境下都能够提供稳定、可靠的数据传输和设备连接。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值