Introduction to the DirectX 9 High-Level Shader Language

本文详细介绍了DirectX 9的高级着色语言(HLSL),包括语言基础、内建函数、集成到引擎的方法和优化策略。通过示例着色器,展示了如何使用HLSL实现算法级别的思考,避免硬件细节。文章还涵盖了HLSL与Direct3D、D3DX库的交互,以及如何在不使用D3DX效果的情况下在引擎中整合HLSL着色器。
摘要由CSDN通过智能技术生成

Craig Peeper
Microsoft Corporation

Jason L. Mitchell
ATI Research

July 2003

Applies to:
   DirectX® 9 High-Level Shader Language

Summary: In an excerpt from the forthcoming book ShaderX2 - Introduction and Tutorials with DirectX 9, Craig Peeper and Jason Mitchell present a detailed introduction to the Microsoft DirectX High-Level Shader Language, including a number of example shaders and optimization strategies. (37 printed pages)

Contents:

Introduction
A Simple Example
Assembly Language and Compile Targets
Language Basics
Intrinsics
Integration into an Engine Without Using D3DX Effects
SDK Updates
Conclusion
Acknowledgements

Introduction

One of the most empowering new components of DirectX® 9 is the High-Level Shader Language (HLSL). Using this standard high-level language, shader writers are able to think at the algorithm level while implementing shaders, rather than worry about meddlesome hardware details such as register allocation, register read-port limits, instruction co-issuing and so on. In addition to freeing the developer from hardware details, the HLSL also has all of the usual advantages of a high-level language such as easy code reuse, improved readability and the presence of an optimizing compiler. Many of the chapters in this book and in the ShaderX2 - Shader Tips & Tricks book will utilize shaders which are written in HLSL. As a result, it will be much easier for you to understand and work with those shaders after reading this introductory chapter.

In this chapter, we will outline the basic structure of the language itself as well as strategies for integrating HLSL shaders into your application.

A Simple Example

Before presenting an exhaustive description of the HLSL, let's first have a look at one HLSL vertex shader and one HLSL pixel shader taken from an application which renders simple procedural wood. The first HLSL shader shown below is a simple vertex shader:

float4x4 view_proj_matrix;
float4x4 texture_matrix0;

struct VS_OUTPUT
{
   float4 Pos     : POSITION;
   float3 Pshade  : TEXCOORD0;
};


VS_OUTPUT main (float4 vPosition : POSITION)
{
   VS_OUTPUT Out = (VS_OUTPUT) 0; 

   // Transform position to clip space
   Out.Pos = mul (view_proj_matrix, vPosition);

   // Transform Pshade
   Out.Pshade = mul (texture_matrix0, vPosition);

   return Out;
}

The first two lines of this shader declare a pair of 4 × 4 matrices called view_proj_matrix and texture_matrix0. Following these global-scope matrices, a structure is declared. This VS_OUTPUT structure has two members: a float4 called Pos and a float3 called Pshade.

The main function for this shader takes a single float4 input parameter and returns a VS_OUTPUT structure. The float4 input vPosition is the sole input to the shader while the returned VS_OUTPUT struct defines this vertex shader's output. For now, don't worry about the POSITION and TEXCOORD0 keywords following these parameters and structure members. These are called semantics and their meaning will be discussed later in this chapter.

Looking at the actual code body of the main function, you'll see that an intrinsic function called mul is used to multiply the input vPosition vector by the view_proj_matrix matrix. This intrinsic is very commonly used in vertex shaders to perform vector-matrix multiplication. In this case, vPosition is treated as a column vector since it is the second parameter to mul. If the vPosition vector were the first parameter to mul, it would be treated as a row vector. The mul intrinsic and other intrinsics will be discussed in more detail later in the chapter. Following the transformation of the input position vPosition to clip space, vPosition is multiplied by another matrix called texture_matrix0 to generate a 3D texture coordinate. The results of both of these transformations have been written to members of a VS_OUTPUT structure, which is returned. A vertex shader must always output a clip-space position at a minimum. Any additional values output from the vertex shader are interpolated across the rasterized polygon and are available as inputs to the pixel shader. In this case, the 3D Pshade is passed from the vertex to the pixel shader via an interpolator.

Below, we see a simple HLSL procedural wood pixel shader. This pixel shader, which is written to work with the vertex shader we just described, will be compiled for the ps_2_0 target.

float4 lightWood; // xyz == Light Wood Color
float4 darkWood;  // xyz == Dark Wood Color
float  ringFreq;  // ring frequency

sampler PulseTrainSampler;

float4 hlsl_rings (float4 Pshade : TEXCOORD0) : COLOR
{
    float scaledDistFromZAxis = sqrt(dot(Pshade.xy, Pshade.xy)) * ringFreq;

    float blendFactor = tex1D (PulseTrainSampler, scaledDistFromZAxis);
 
    return lerp (darkWood, lightWood, blendFactor);
}

The first few lines of this shader are the declaration of a pair of floating-point 4-tuples and one scalar float at global scope. Following these variables, a sampler called PulseTrainSampler is declared. Samplers will be discussed in more detail later in the chapter but for now you can just think of a sampler as a window into video memory with associated state defining things like filtering, and texture coordinate addressing modes. With variable and sampler declarations out of the way, we move on to the body of the shader code. You can see that there is one input parameter called Pshade, which is interpolated across the polygon. This is the value that was computed at each vertex by the vertex shader above. In the pixel shader, the Cartesian distance from the shader-space z axis is computed, scaled and used as a 1D texture coordinate to access the texture bound to the PulseTrainSampler. The scalar color that is returned from the tex1D() sampling function is used as a blend factor to blend between the two constant colors (lightWood and darkWood) declared at global scope of the shader. The 4D vector result of this blend is the final output of the pixel shader. All pixel shaders must return a 4D RGBA color at a minimum. We will discuss additional optional pixel shader outputs later in the chapter.

Assembly Language and Compile Targets

Now that we have seen a few HLSL shaders, we'll discuss briefly how the language relates to Direct3D, D3DX, assembly shader models and your application. Shaders were first added to Direct3D in DirectX 8. At that time, several virtual shader machines were defined—each roughly corresponding to a particular graphics processor produced by each of the top 3D graphics hardware vendors. For each of these virtual shader machines, an assembly language was designed. In DirectX 8.0 and DirectX 8.1, programs written to these shader models (named vs_1_1 and ps_1_1 through ps_1_4) were relatively short and were generally written by developers directly in the appropriate assembly language. As shown on the left side of Figure 1, the application would pass this human-readable assembly language code to the D3DX library via D3DXAssembleShader()and get back a binary representation of the shader which would in turn be passed to Direct3D via CreatePixelShader() or CreateVertexShader(). For more on the details of the legacy assembly shader models, please refer to the many resources available online and offline, including Shader X and the DirectX SDK.

Figure 1. Use of D3DX for Assembly and Compilation in DirectX 8 and DirectX 9

As shown on the right side of Figure 1, the situation in DirectX 9 is very similar in that the application passes an HLSL shader to D3DX via the D3DXCompileShader() API and gets back a binary representation of the compiled shader which is in turn passed to Direct3D via CreatePixelShader() or CreateVertexShader(). The binary asm code generated is a function only of the compile target chosen, not the specific graphics device in the user's or developer's system. That is, the binary asm which is generated is vendor-neutral and will be the same no matter where you compile or run it. In fact, the Direct3D runtime itself does not know anything about HLSL, only the binary assembly shader models. This is nice because it means that the HLSL compiler can be updated independent of the Direct3D runtime. In fact, between press time and the release of the first printing of this book in late summer 2003, Microsoft plans to release a DirectX SDK Update which will contain an updated HLSL compiler.

In addition to the development of the HLSL compiler in D3DX, DirectX 9.0 also introduced additional assembly-level shader models to expose the functionality of the latest generation of 3D graphics hardware. Application developers can feel free to work directly in the assembly languages for these new models (vs_2_0, vs_3_0, ps_2_0 and ps_3_0) but we expect most developers to move wholesale to HLSL for shader development.

Hardware Realities

Of course, just because you can write an HLSL program to express a particular shading algorithm doesn't mean it will run on a given piece of hardware. As we discussed earlier, an application calls D3DX to compile an HLSL shader to binary asm via the D3DXCompileShader() API. One of the parameters to this API entrypoint is a parameter which defines which of the assembly language models (or compile targets) the HLSL compiler should use to express the final shader code. If an application is doing HLSL shader compilation at run time (as opposed to offline), the application could examine the capabilities of the Direct3D device and select the compile target to match. If the algorithm expressed in the HLSL shader is too complex to execute on the selected compile target, compilation will fail. What this means is that while HLSL is a huge benefit to shader development, it does not free developers from the realities of shipping games to a target audience which owns graphics devices of varying capabilities. As a game developer, you still have to manage a tiered approach to your visuals, writing better shaders for better graphics cards and more basic versions for older cards. With well-written HLSL, however, this burden can be eased significantly.

Compilation Failure

As mentioned above, failure of a given HLSL shader to compile for a particular compile target is an indication that the shader is too complex for the compile target. This can mean that the shader either requires too many resources or it requires some capability, such as dynamic branching, that is not supported by the chosen compile target. For example, an HLSL shader could be written to access a given texture map six times in a shader. If this shader is compiled for the ps_1_1 compile target, compilation will fail since the ps_1_1 model supports only four textures. Another common source of compilation failure is exceeding the maximum instruction count of the chosen compile target. An algorithm expressed in HLSL may simply require too many instructions to be executed by a given compile target.

It is important to note that the choice of compile target does not restrict the HLSL syntax that a shader writer can use. For example, a shader writer can use 'for' loops, subroutines, 'if-else' statements etc. and still compile for targets which don't natively support looping, branching or 'if-else' statements. In such cases, the compiler will unroll loops, inline function calls and execute both branches of an 'if-else' statement, selecting the proper result based upon the original value used in the 'if-else' statement. Of course, if the resulting shader is too long or otherwise exceeds the resources of the compile target, compilation will fail.

The Commandline Compiler: FXC

Rather than compile HLSL shaders using D3DX on the customer's machine at application load time or at first use, many developers choose to compile their shaders from HLSL to binary asm before they even ship. This keeps their HLSL source away from prying eyes and also ensures that all of the shaders that their app will ever run have gone through their internal quality assurance process. A convenient utility which allows developers to compile shaders offline is the fxc commandline compiler which is provided in the DirectX 9.0 SDK. This utility has a number of convenient options that you can use to not only compile your shaders on the commandline but also generate disassembled code for the specified compile target. Studying the disassembled output can be very educational during development if you want to optimize your shaders or just generally get to know the virtual shader machine's capabilities at a more detailed level. These commandline options are summarized in Table 1.

Table 1. FXC commandline options

-T target Compile target (default: vs_2_0)
-E name entrypoint name (default: main)
-Od disable optimizations
-Vd disable validation
-Zi enable debugging information
-Zpr pack matrices in row-major order
-Zpc pack matrices in column-major order
-Fo file output object file
-Fc file output listing of generated code
-Fh file output header containing generated code
-D id=text define macro
-nologo suppress copyright message

Now that you understand the context in which the HLSL compiler can be used for shader development, we will discuss the actual mechanics of the language. As we progress, it is important to keep the notion of a compile target and the varying capabilities of the underlying assembly shader models in mind.

Language Basics

Now that you have a sense of what HLSL vertex and pixel shaders look like and how they interact with the low-level assembly shaders, we'll discuss some of the details of the language itself.

Keywords

Keywords are predefined identifiers that are reserved for the HLSL language and cannot be used as identifiers in your program. Keywords marked with '*' are case insensitive.

Table 2. Keywords reserved for HLSL language

asm* bool compile const
decl* do double else
extern false float for
half if in inline
inout int matrix* out
pass* pixelshader* return sampler
shared static string* struct
technique* texture* true typedef
uniform vector* vertexshader* void
volatile while    

The following keywords are currently unused, but are reserved for potential future use:

Table 3. Keywords currently unused but reserved

auto break compile const
char class case catch
default delete const_cast continue
explicit friend dynamic_cast enum
mutable namespace goto long
private protected new operator
reinterpret_cast short public register
static_cast switch signed sizeof
throw try template this
typename unsigned using union
virtual      

Datatypes

The HLSL has support for a variety of datatypes, from simple scalars to more complex types such as vectors and matrices.

Scalar Types

The language supports the following scalar datatypes:

Table 4. Scalar datatypes

bool true or false
int 32-bit signed integer
half 16-bit floating point value
float 32-bit floating point value
double 64-bit floating point value

If you are already familiar with the assembly-level programming models, you will know that graphics processors do not currently have native support for all of these datatypes. As a result, integers may need to be emulated using floating point hardware. This means that integer operations that go outside the range of integers that can be expressed as floats on these platforms are not guaranteed to function as expected. Additionally, not all target platforms have native support for half or double values. If the target platform does not, these will be emulated using float.

Vector Types

You will often find yourself declaring vector variables in your HLSL shaders. There are a variety of ways that these vectors can be declared, including the following:

Table 5. Vector types

vector A vector of dimension 4; each component is of type float.
vector < type, size > A vector of dimension size; each component is of scalar type type.

The most common way that you will see shader authors declare vectors, however, is by using the name of a type followed by an integer from 2 to 4. To declare a 4-tuple of floats, for example, you could use any of the following vector declarations:

float4 fVector0;
float  fVector1[4];
vector fVector2;
vector <float, 4> fVector3;

To declare a 3-tuple of bools, for example, you could use any of the following declarations:

bool3 bVector0;
bool  bVector1[3];
vector <bool, 3> bVector2;

Once you have defined a vector, you may access its individual components by using the array access syntax or using a swizzle. In the swizzle case, the components must come from either the { x. y, z, w} or { r, g, b, a} name-space (but not both). For example:

float4 pos = {3.0f, 5.0f, 2.0f, 1.0f};
float  value0 = pos[0]; // value0 is 3.0f
float  value1 = pos.x;  // value1 is 3.0f
float  value2 = pos.g;  // value2 is 5.0f
float2 vec0   = pos.xy; // vec0 is {3.0f, 5.0f}
float2 vec1   = pos.ry; // INVALID because of bad swizzle

It should be noted that the ps_2_0 and lower pixel shader models do not have native support for arbitrary swizzles. Hence, concise high-level code which uses swizzles can result in fairly nasty binary asm when compiling to these targets. You should familiarize yourself with the native swizzles available in these assembly models.

Matrix Types

Another very common type of variable you will find yourself using in HLSL shaders is matrices, which are 2D arrays of data. Like scalars and vectors, matrices may be composed of any of the basic datatypes: bool, int, half, float or double. Matrices may be of any size, but you will typically find shader writers using matrices with up to 4 rows and columns. You will recall that the example vertex shader shown at the beginning of the chapter declared two 4 × 4 float matrices at global scope:

float4x4 view_proj_matrix;
float4x4 texture_matrix0;

Naturally, other dimensions of matrices can be used. For example, we could declare a floating-point matrix with 3 rows and 4 columns in a variety of ways:

float3x4            mat0;
matrix<float, 3, 4> mat1;

Like vectors, the individual elements of matrices can be accessed using array or structure/swizzle syntax. For example, the following array indexing syntax can be used to access the top-left element of the matrix view_proj_matrix:

float fValue = view_proj_matrix[0][0];

There is also a structure syntax defined for access to and swizzling of matrix elements. For zero-based row-column position, you can use any of the following:

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值