Compute Shader Overview

A compute shader is a programmable shader stage that expands Microsoft Direct3D 11 beyond graphics programming. The compute shader technology is also known as the DirectCompute technology.

Like other programmable shaders (vertex and geometry shaders for example), a compute shader is designed and implemented with HLSL but that is just about where the similarity ends. A compute shader provides high-speed general purpose computing and takes advantage of the large numbers of parallel processors on the graphics processing unit (GPU). The compute shader provides memory sharing and thread synchronization features to allow more effective parallel programming methods. You call the ID3D11DeviceContext::Dispatch or ID3D11DeviceContext::DispatchIndirect method to execute commands in a compute shader. A compute shader can run on many threads in parallel.

Using Compute Shader on Direct3D 10.x Hardware

A compute shader on Microsoft Direct3D 10 is also known as DirectCompute 4.x.

If you use the Direct3D 11 API and updated drivers, feature level 10 and 10.1 Direct3D hardware can optionally support a limited form of DirectCompute that uses the cs_4_0 and cs_4_1 profiles. When you use DirectCompute on this hardware, keep the following limitations in mind:

  • The maximum number of threads is limited to D3D11_CS_4_X_THREAD_GROUP_MAX_THREADS_PER_GROUP (768) per group.
  • The X and Y dimension of numthreads is limited to D3D11_CS_4_X_THREAD_GROUP_MAX_X (768) and D3D11_CS_4_X_THREAD_GROUP_MAX_Y (768).
  • The Z dimension of numthreads is limited to 1.
  • The Z dimension of dispatch is limited to D3D11_CS_4_X_DISPATCH_MAX_THREAD_GROUPS_IN_Z_DIMENSION (1).
  • Only one unordered-access view can be bound to the shader (D3D11_CS_4_X_UAV_REGISTER_COUNT is 1).
  • Only RWStructuredBuffers and RWByteAddressBuffers are available as unordered-access views.
  • A thread can only access its own region in groupshared memory for writing, though it can read from any location.
  • SV_GroupIndex or SV_GroupThreadID must be used when accessing groupshared memory for writing.
  • Groupshared memory is limited to 16KB per group.
  • A single thread is limited to a 256 byte region of groupshared memory for writing.
  • No atomic instructions are available.
  • No double-precision values are available.

Using Compute Shader on Direct3D 11.x Hardware

A compute shader on Direct3D 11 is also known as DirectCompute 5.0.

When you use DirectCompute with cs_5_0 profiles, keep the following items in mind:

  • The maximum number of threads is limited to D3D11_CS_THREAD_GROUP_MAX_THREADS_PER_GROUP (1024) per group.
  • The X and Y dimension of numthreads is limited to D3D11_CS_THREAD_GROUP_MAX_X (1024) and D3D11_CS_THREAD_GROUP_MAX_Y (1024).
  • The Z dimension of numthreads is limited to D3D11_CS_THREAD_GROUP_MAX_Z (64).
  • The maximum dimension of dispatch is limited to D3D11_CS_DISPATCH_MAX_THREAD_GROUPS_PER_DIMENSION (65535).
  • The maximum number of unordered-access views that can be bound to a shader is D3D11_PS_CS_UAV_REGISTER_COUNT (8).
  • Supports RWStructuredBuffers, RWByteAddressBuffers, and typed unordered-access views (RWTexture1DRWTexture2DRWTexture3D, and so on).
  • Atomic instructions are available.
  • Double-precision support might be available. For information about how to determine whether double-precision is available, see D3D11_FEATURE_DOUBLES.

//

New Resource Types

Read/Write Buffers and Textures

Shader model 4 resources are read only. Shader model 5 implements a new corresponding set of read/write resources:

These resources require a resource variable to access memory (through indexing) as there are no methods for accessing memory directly. A resource variable can be used on the left and right sides of an equation; if used on the right side, the template type must be single component (float, int, or uint).

Structured Buffer

A structured buffer is a buffer that contains elements of equal sizes. Use a structure with one or more member types to define an element. Here is a structure with three members.

struct MyStruct
{
    float4 Color;
    float4 Normal;
    bool isAwesome;
};

Use this structure to declare a structured buffer like this:

StructuredBuffer<MyStruct> mySB;

n addition to indexing, a structured buffer supports accessing a single member like this:

float4 myColor = mySb[27].Color;

Use the following object types to access a structured buffer:

Atomic functions which implement interlocked operations are allowed on int and uint elements in an RWStructuredBuffer.

Byte Address Buffer

A byte address buffer is a buffer whose contents are addressable by a byte offset. Normally, the contents of a buffer are indexed per element using a stride for each element (S) and the element number (N) as given by S*N. A byte address buffer, which can also be called a raw buffer, uses a byte value offset from the beginning of the buffer to access data. The byte value must be a multiple of four so that it is DWORD aligned. If any other value is provided, behavior is undefined.

Shader model 5 introduces objects for accessing a read-only byte address buffer as well as a read-write byte address buffer. The contents of a byte address buffer is designed to be a 32-bit unsigned integer; if the value in the buffer is not really an unsigned integer, use a function such as asfloat to read it.

Unordered Access Buffer or Texture

An unordered access resource (which includes buffers, textures, and texture arrays - without multisampling), allows temporally unordered read/write access from multiple threads. This means that this resource type can be read/written simultaneously by multiple threads without generating memory conflicts through the use of Atomic Functions.

Create an unordered access buffer or texture by calling a function such as ID3D11Device::CreateBuffer or ID3D11Device::CreateTexture2D and passing in the D3D11_BIND_UNORDERED_ACCESS flag from the D3D11_BIND_FLAG enumeration.

Unordered access resources can only be bound to pixel shaders and compute shaders. During execution, pixel shaders or compute shaders running in parallel have the same unordered access resources bound.

Append and Consume Buffer

An append and consume buffer is a special type of an unordered resource that supports adding and removing values from the end of a buffer similar to the way a stack works.

An append and consume buffer must be a structured buffer:

Use these resources through their methods, these resources do not use resource variables.

///

Accessing Resources

Access By Byte Offset

Two new buffer types can be accessed using a byte offset:

Access By Index

Resource types can use an index to reference a specific location in the resource. Consider this example:

uint2 pos;
Texture2D<float4> myTexture;
float4 myVar = myTexture[pos];

This example assigns the 4 float values that are stored at the texel located at the pos position in the myTexture 2D texture resource to the myVar variable.

 Note

The default for accessing a texture in this way is mipmap level zero (the most detailed level).

 Note

The "float4 myVar = myTexture[pos];" line is equivalent to "float4 myVar = myTexture.Load(uint3(pos,0));". Access by index is a new HLSL syntax enhancement.

 Note

The compiler in the June 2010 version of the DirectX SDK and later lets you index all resource types except for byte address buffers.

 Note

The June 2010 compiler and later lets you declare local resource variables. You can assign globally defined resources (like myTexture) to these variables and use them the same way as their global counterparts.

Access By Mips Method

Texture objects have a mips method (for example, Texture2D.mips), which you can use to specify the mipmap level. This example reads the color stored at (7,16) on mipmap level 2 in a 2D texture:

uint x = 7;
uint y = 16;
float4 myColor = myTexture.mips[2][uint2(x,y)];

This is an enhancement from the June 2010 compiler and later. The "myTexture.mips[2][uint2(x,y)]" expression is equivalent to "myTexture.Load(uint3(x,y,2))".

/

Atomic Functions

To access a new resource type or shared memory, use an interlocked intrinsic function. Interlocked functions are guaranteed to operate atomically. That is, they are guaranteed to occur in the order programmed. This section lists the atomic functions.

///

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值