VideoMemory, SystemMemory And AGPMemory

原帖在下文,笔记:

DxFlag:
像D3DUSAGE_DYNAMIC,  D3DPOOL_DEFAULT,  D3DPOOL_MANAGED,  D3DPOOL_SYSTEMMEM这些都是给driver一些提示说我们想这样用这些资源(静态的,动态的。。。),到底这些资源会被如何管理还是driver来说了算。
一些倾向是:
POOL:
  • D3DPOOL_DEFAULT:会根据usage来放置资源,基本是video mem和AGP mem
  • D3DPOOL_MANAGED:会在systemmemory中有备份,必要的时候copy到video/AGP mem
  • D3DPOOL_SYSTEMMEM:纯system memory中的资源,基本不用来做和GPU直接相关的事情,做一些类似UpdateSurface的事情。
Usage:会给driver有效的提示来分配哪块内存了。

      AGP memory
      是bios来决定分配的,这里有一个查看器, http://developer.nvidia.com/object/agp_memoryapp.html
      我的系统上分配了500+M的AGP memory,分配出去的memory会从系统内存划掉,cpu端就不能拿这个来做graphics以外的事情了。
      AGP是uncached memory,也就是cpu端读写都不快,但是快于cpu读写video memory,gpu读取也不快,慢于video memory

      Lock
      driver会用类似“buffer renaming”的方式来优化lock,这样discard这样的flag对于lock的效率还是有很好的影响的。

      从这里最先看到:http://www.cnitblog.com/playerwing/archive/2007/10/05/34407.html
      然后转到原帖:http://www.gamedev.net/topic/388869-whats-the-difference-between-these-2-things/
      hose two identifiers are hints to the driver for how the buffer will be used, to optimize how the card accesses the data. They make sense even without AGP memory.
      On systems with AGP memory, there are three classes of memory:
      1) System Memory. This is cached, and reasonably fast to read from and write to with the CPU. However, it typically needs an additional copy before the graphics card can use it. System and scratch pool memory goes here.
      2) AGP Memory. This is still CPU-local RAM, but it is not cached. This means that it's slow to read from, and it's slow to write to, UNLESS you write sequentially, without doing too much other memory traffic inbetween, and overwrite every byte, so that the write combiners don't need to fetch lines from RAM to do a combine. Thus, generating software-transformed vertices as a stream into this buffer might still be fast. For the GPU, the AGP memory is directly accessible, so no additional copy is needed. Dynamic pool memory goes here.
      3) Video Memory. This is RAM that's local to the GPU. It typically has insanely high throughput. It is accessible over the bus for the CPU, but going over the bus is really slow; typically both for reading and for writing. Thus writing directly into this memory (or even worse, reading out of it), is not recommended. Default pool memory goes here.
      On systems with PCI-Express, some of the AGP vs system memory differences are reduced, but the usage hints you're giving the driver ("I will change the data by writing it sequentially" vs "I will not change the data much") are still useful for optimizing performance.
      Video memory is the memory chips physically located on the card. The card can easily access this memory, while reading it from the CPU is extremely slow.
      AGP memory a part of your main memory on the motherboard that has been set aside for talking to the graphics card. The card and your CPU can access this memory at a decent speed.
      This pageshows that your BIOS "AGP aperture size" controls the size of your AGP memory, and explains how "reducing the AGP aperture size won't save you any RAM. Again, what setting the AGP aperture size does is limit the amount of RAM the AGP bus can appropriate when it needs to. It is not used unless absolutely necessary. So, setting a 64MB AGP aperture doesn't mean 64MB of your RAM will be used up as AGP memory. It will only limit the maximum amount that can be used by the AGP bus to 64MB (with a usable AGP memory size of only 26MB)."
      1) video memory can mean one of two things depending on the context the term is used in:
      a. video memory is generally any memory which is used by the graphics chip.
      b. video memory (correctly "local video memory") is memory that exists on the graphic card itself (i.e. RAM chips that live on the graphics card, they are 'local' to the graphics chip).
      2) AGP memory is main memory on your system motherboard that has been specially assigned for graphics use. The "AGP Aperture" setting in your system BIOS controls this assignment. The more you have assigned for AGP use, the less you have for general system use. AGP memory is sometimes also known as "non-local video memory".
      3a) 'Local' video memory is very fast for the graphics chip to read from and write to because it is 'local' to the graphics chip.
      3b) 'Local' video memory is extremely slow to read from using for the system CPU, and reasonably slow to write to using the system CPU. 
      This is for a number of reasons; partly because the memory is physically on a different board (the graphics card) to the CPU (i.e. it's not 'local' for the CPU); partly because that memory isn't cached at all for reads using the CPU, and only burst cached for writes; partly due to the way data transfers over bus standards such as AGP must be done.
      4a) AGP memory is reasonably fast for the graphics chip to read from or write to, but not as fast as local video memory.
      4b) AGP memory is fairly slow to read from using the system CPU because it is marked as "Write Combined" so any reads don't benefit from the L2 and L1 caches (i.e. each read is effectively a cache-miss). 
      AGP memory is however faster than local video memory to read from using the CPU since it is local to the CPU.
      4c) AGP memory is reasonably fast to write to using the system CPU. Although not fully cached, "Write Combined" memory uses a small buffer that collects sequential writes to memory (32 or 64 bytes IIRC) and writes them out in one go. This is why sequential access of vertex data using the CPU is preferable for performance.
      5) D3DUSAGE_DYNAMIC is only a hint to the display driver about how you intend using that resource, usually it will give you AGP memory, but it isn't guaranteed (so don't rely it!).
      6) Generally, for vertex buffers which you need to Lock() and update using the CPU regularly at runtime should be D3DUSAGE_DYNAMIC, and all others should be static.
      7) Graphics drivers use techniques such as "buffer renaming" where multiple copies of the buffer are created and cycled through to reduce the chance of stalls when dynamic resources are locked. This is why it's essential to use the D3DLOCK_DISCARD and D3DLOCK_NOOVERWRITE locking flags correctly if you want good performance. It's also one of the many reasons you shouldn't rely on the data pointer from a Lock() after the resource has been unlocked.
      8) General advice for good performance:
      - treat all graphics resources as write-only for the CPU, particularly those in local video memory. CPU reads from graphics resources is a recipe for slowness.
      - CPU writes to locked graphics resources should be done sequentially.
      - it's better to write all of a vertex out to memory with the CPU than it is to skip elements of it. Skipping can harm the effectiveness of write combining, and even cause hidden reads in some situations (and reads are bad - see above).
      since the "local video memory" is fast for video card to manipulate, and the video card dedicated to GRAPHICS PROCESS,why bother to use the "AGP memory"?
      is that only because the "local video memory" may be not enough for graphic data storage?
      what role does the CPU play in the process of graphics??
      Yes. That's one of the main reasons. AGP comes from a time (~10 years ago!) when a typical graphics card would have, say, 2MB of local video memory and a typical PC system had 64-128MB of main system memory, so it made sense to set some system memory aside for situations where there wasn't enough local memory.
      In these days of monster graphics cards with 512MB of local video memory, it's less likely used as an overflow.
      Another reason is dynamic graphics data - any data that needs to be regularly modified with the CPU is usually better off in AGP memory (it's write combined, but it's local to the CPU too, so uses less CPU time to access)
      Not very much these days. Mostly application-side jobs like writing vertex data into locked buffers, object culling, traversing scene graphs, loading resources into main memory, things like that. 
      On the D3D and device driver side: handling the D3D API, swizzling and other conversion when some types of resources are locked/unlocked [I believe some GPUs can even do their own swizzling now though], and setting up the command buffer for the GPU.
      Before hardware T&L, the CPU also handled all vertex processing.
      The fact that modern GPUs now handle so much of the graphics pipeline makes avoiding unnecessary serialization between CPU and GPU all the more important (i.e. stalls where one has a resource locked and the other wants to use it), thus things like buffer renaming. Serialization between CPU and GPU throws away the GPUs processing ability.
      • 0
        点赞
      • 0
        收藏
        觉得还不错? 一键收藏
      • 4
        评论

      “相关推荐”对你有帮助么?

      • 非常没帮助
      • 没帮助
      • 一般
      • 有帮助
      • 非常有帮助
      提交
      评论 4
      添加红包

      请填写红包祝福语或标题

      红包个数最小为10个

      红包金额最低5元

      当前余额3.43前往充值 >
      需支付:10.00
      成就一亿技术人!
      领取后你会自动成为博主和红包主的粉丝 规则
      hope_wisdom
      发出的红包
      实付
      使用余额支付
      点击重新获取
      扫码支付
      钱包余额 0

      抵扣说明:

      1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
      2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

      余额充值