Craig Peeper
Microsoft Corporation
Jason L. Mitchell
ATI Research
July 2003
Applies to:
DirectX® 9 High-Level Shader Language
Summary: In an excerpt from the forthcoming book ShaderX2 - Introduction and Tutorials with DirectX 9, Craig Peeper and Jason Mitchell present a detailed introduction to the Microsoft DirectX High-Level Shader Language, including a number of example shaders and optimization strategies. (37 printed pages)
Contents:
Introduction
A Simple Example
Assembly Language and Compile Targets
Language Basics
Intrinsics
Integration into an Engine Without Using D3DX Effects
SDK Updates
Conclusion
Acknowledgements
Introduction
One of the most empowering new components of DirectX® 9 is the High-Level Shader Language (HLSL). Using this standard high-level language, shader writers are able to think at the algorithm level while implementing shaders, rather than worry about meddlesome hardware details such as register allocation, register read-port limits, instruction co-issuing and so on. In addition to freeing the developer from hardware details, the HLSL also has all of the usual advantages of a high-level language such as easy code reuse, improved readability and the presence of an optimizing compiler. Many of the chapters in this book and in the ShaderX2 - Shader Tips & Tricks book will utilize shaders which are written in HLSL. As a result, it will be much easier for you to understand and work with those shaders after reading this introductory chapter.
In this chapter, we will outline the basic structure of the language itself as well as strategies for integrating HLSL shaders into your application.
A Simple Example
Before presenting an exhaustive description of the HLSL, let's first have a look at one HLSL vertex shader and one HLSL pixel shader taken from an application which renders simple procedural wood. The first HLSL shader shown below is a simple vertex shader:
float4x4 view_proj_matrix; float4x4 texture_matrix0; struct VS_OUTPUT { float4 Pos : POSITION; float3 Pshade : TEXCOORD0; }; VS_OUTPUT main (float4 vPosition : POSITION) { VS_OUTPUT Out = (VS_OUTPUT) 0; // Transform position to clip space Out.Pos = mul (view_proj_matrix, vPosition); // Transform Pshade Out.Pshade = mul (texture_matrix0, vPosition); return Out; }
The first two lines of this shader declare a pair of 4 × 4 matrices called view_proj_matrix
and texture_matrix0
. Following these global-scope matrices, a structure is declared. This VS_OUTPUT
structure has two members: a float4
called Pos
and a float3
called Pshade
.
The main function for this shader takes a single float4
input parameter and returns a VS_OUTPUT
structure. The float4
input vPosition
is the sole input to the shader while the returned VS_OUTPUT
struct defines this vertex shader's output. For now, don't worry about the POSITION
and TEXCOORD0
keywords following these parameters and structure members. These are called semantics and their meaning will be discussed later in this chapter.
Looking at the actual code body of the main function, you'll see that an intrinsic function called mul
is used to multiply the input vPosition
vector by the view_proj_matrix
matrix. This intrinsic is very commonly used in vertex shaders to perform vector-matrix multiplication. In this case, vPosition
is treated as a column vector since it is the second parameter to mul
. If the vPosition
vector were the first parameter to mul
, it would be treated as a row vector. The mul
intrinsic and other intrinsics will be discussed in more detail later in the chapter. Following the transformation of the input position vPosition
to clip space, vPosition
is multiplied by another matrix called texture_matrix0
to generate a 3D texture coordinate. The results of both of these transformations have been written to members of a VS_OUTPUT
structure, which is returned. A vertex shader must always output a clip-space position at a minimum. Any additional values output from the vertex shader are interpolated across the rasterized polygon and are available as inputs to the pixel shader. In this case, the 3D Pshade
is passed from the vertex to the pixel shader via an interpolator.
Below, we see a simple HLSL procedural wood pixel shader. This pixel shader, which is written to work with the vertex shader we just described, will be compiled for the ps_2_0 target.
float4 lightWood; // xyz == Light Wood Color float4 darkWood; // xyz == Dark Wood Color float ringFreq; // ring frequency sampler PulseTrainSampler; float4 hlsl_rings (float4 Pshade : TEXCOORD0) : COLOR { float scaledDistFromZAxis = sqrt(dot(Pshade.xy, Pshade.xy)) * ringFreq; float blendFactor = tex1D (PulseTrainSampler, scaledDistFromZAxis); return lerp (darkWood, lightWood, blendFactor); }
The first few lines of this shader are the declaration of a pair of floating-point 4-tuples and one scalar float
at global scope. Following these variables, a sampler called PulseTrainSampler
is declared. Samplers will be discussed in more detail later in the chapter but for now you can just think of a sampler as a window into video memory with associated state defining things like filtering, and texture coordinate addressing modes. With variable and sampler declarations out of the way, we move on to the body of the shader code. You can see that there is one input parameter called Pshade
, which is interpolated across the polygon. This is the value that was computed at each vertex by the vertex shader above. In the pixel shader, the Cartesian distance from the shader-space z axis is computed, scaled and used as a 1D texture coordinate to access the texture bound to the PulseTrainSampler
. The scalar color that is returned from the tex1D()
sampling function is used as a blend factor to blend between the two constant colors (lightWood
and darkWood
) declared at global scope of the shader. The 4D vector result of this blend is the final output of the pixel shader. All pixel shaders must return a 4D RGBA color at a minimum. We will discuss additional optional pixel shader outputs later in the chapter.
Assembly Language and Compile Targets
Now that we have seen a few HLSL shaders, we'll discuss briefly how the language relates to Direct3D, D3DX, assembly shader models and your application. Shaders were first added to Direct3D in DirectX 8. At that time, several virtual shader machines were defined—each roughly corresponding to a particular graphics processor produced by each of the top 3D graphics hardware vendors. For each of these virtual shader machines, an assembly language was designed. In DirectX 8.0 and DirectX 8.1, programs written to these shader models (named vs_1_1 and ps_1_1 through ps_1_4) were relatively short and were generally written by developers directly in the appropriate assembly language. As shown on the left side of Figure 1, the application would pass this human-readable assembly language code to the D3DX library via D3DXAssembleShader()
and get back a binary representation of the shader which would in turn be passed to Direct3D via CreatePixelShader()
or CreateVertexShader()
. For more on the details of the legacy assembly shader models, please refer to the many resources available online and offline, including Shader X and the DirectX SDK.
Figure 1. Use of D3DX for Assembly and Compilation in DirectX 8 and DirectX 9
As shown on the right side of Figure 1, the situation in DirectX 9 is very similar in that the application passes an HLSL shader to D3DX via the D3DXCompileShader()
API and gets back a binary representation of the compiled shader which is in turn passed to Direct3D via CreatePixelShader()
or CreateVertexShader()
. The binary asm code generated is a function only of the compile target chosen, not the specific graphics device in the user's or developer's system. That is, the binary asm which is generated is vendor-neutral and will be the same no matter where you compile or run it. In fact, the Direct3D runtime itself does not know anything about HLSL, only the binary assembly shader models. This is nice because it means that the HLSL compiler can be updated independent of the Direct3D runtime. In fact, between press time and the release of the first printing of this book in late summer 2003, Microsoft plans to release a DirectX SDK Update which will contain an updated HLSL compiler.
In addition to the development of the HLSL compiler in D3DX, DirectX 9.0 also introduced additional assembly-level shader models to expose the functionality of the latest generation of 3D graphics hardware. Application developers can feel free to work directly in the assembly languages for these new models (vs_2_0, vs_3_0, ps_2_0 and ps_3_0) but we expect most developers to move wholesale to HLSL for shader development.
Hardware Realities
Of course, just because you can write an HLSL program to express a particular shading algorithm doesn't mean it will run on a given piece of hardware. As we discussed earlier, an application calls D3DX to compile an HLSL shader to binary asm via the D3DXCompileShader() API. One of the parameters to this API entrypoint is a parameter which defines which of the assembly language models (or compile targets) the HLSL compiler should use to express the final shader code. If an application is doing HLSL shader compilation at run time (as opposed to offline), the application could examine the capabilities of the Direct3D device and select the compile target to match. If the algorithm expressed in the HLSL shader is too complex to execute on the selected compile target, compilation will fail. What this means is that while HLSL is a huge benefit to shader development, it does not free developers from the realities of shipping games to a target audience which owns graphics devices of varying capabilities. As a game developer, you still have to manage a tiered approach to your visuals, writing better shaders for better graphics cards and more basic versions for older cards. With well-written HLSL, however, this burden can be eased significantly.
Compilation Failure
As mentioned above, failure of a given HLSL shader to compile for a particular compile target is an indication that the shader is too complex for the compile target. This can mean that the shader either requires too many resources or it requires some capability, such as dynamic branching, that is not supported by the chosen compile target. For example, an HLSL shader could be written to access a given texture map six times in a shader. If this shader is compiled for the ps_1_1 compile target, compilation will fail since the ps_1_1 model supports only four textures. Another common source of compilation failure is exceeding the maximum instruction count of the chosen compile target. An algorithm expressed in HLSL may simply require too many instructions to be executed by a given compile target.
It is important to note that the choice of compile target does not restrict the HLSL syntax that a shader writer can use. For example, a shader writer can use 'for' loops, subroutines, 'if-else' statements etc. and still compile for targets which don't natively support looping, branching or 'if-else' statements. In such cases, the compiler will unroll loops, inline function calls and execute both branches of an 'if-else' statement, selecting the proper result based upon the original value used in the 'if-else' statement. Of course, if the resulting shader is too long or otherwise exceeds the resources of the compile target, compilation will fail.
The Commandline Compiler: FXC
Rather than compile HLSL shaders using D3DX on the customer's machine at application load time or at first use, many developers choose to compile their shaders from HLSL to binary asm before they even ship. This keeps their HLSL source away from prying eyes and also ensures that all of the shaders that their app will ever run have gone through their internal quality assurance process. A convenient utility which allows developers to compile shaders offline is the fxc commandline compiler which is provided in the DirectX 9.0 SDK. This utility has a number of convenient options that you can use to not only compile your shaders on the commandline but also generate disassembled code for the specified compile target. Studying the disassembled output can be very educational during development if you want to optimize your shaders or just generally get to know the virtual shader machine's capabilities at a more detailed level. These commandline options are summarized in Table 1.
Table 1. FXC commandline options
-T target |
Compile target (default: vs_2_0) |
-E name |
entrypoint name (default: main) |
-Od |
disable optimizations |
-Vd |
disable validation |
-Zi |
enable debugging information |
-Zpr |
pack matrices in row-major order |
-Zpc |
pack matrices in column-major order |
-Fo file |
output object file |
-Fc file |
output listing of generated code |
-Fh file |
output header containing generated code |
-D id=text |
define macro |
-nologo |
suppress copyright message |
Now that you understand the context in which the HLSL compiler can be used for shader development, we will discuss the actual mechanics of the language. As we progress, it is important to keep the notion of a compile target and the varying capabilities of the underlying assembly shader models in mind.
Language Basics
Now that you have a sense of what HLSL vertex and pixel shaders look like and how they interact with the low-level assembly shaders, we'll discuss some of the details of the language itself.
Keywords
Keywords are predefined identifiers that are reserved for the HLSL language and cannot be used as identifiers in your program. Keywords marked with '*' are case insensitive.
Table 2. Keywords reserved for HLSL language
asm* |
bool |
compile |
const |
decl* |
do |
double |
else |
extern |
false |
float |
for |
half |
if |
in |
inline |
inout |
int |
matrix* |
out |
pass* |
pixelshader* |
return |
sampler |
shared |
static |
string* |
struct |
technique* |
texture* |
true |
typedef |
uniform |
vector* |
vertexshader* |
void |
volatile |
while |
The following keywords are currently unused, but are reserved for potential future use:
Table 3. Keywords currently unused but reserved
auto |
break |
compile |
const |
char |
class |
case |
catch |
default |
delete |
const_cast |
continue |
explicit |
friend |
dynamic_cast |
enum |
mutable |
namespace |
goto |
long |
private |
protected |
new |
operator |
reinterpret_cast |
short |
public |
register |
static_cast |
switch |
signed |
sizeof |
throw |
try |
template |
this |
typename |
unsigned |
using |
union |
virtual |
Datatypes
The HLSL has support for a variety of datatypes, from simple scalars to more complex types such as vectors and matrices.
Scalar Types
The language supports the following scalar datatypes:
Table 4. Scalar datatypes
bool |
true or false |
int |
32-bit signed integer |
half |
16-bit floating point value |
float |
32-bit floating point value |
double |
64-bit floating point value |
If you are already familiar with the assembly-level programming models, you will know that graphics processors do not currently have native support for all of these datatypes. As a result, integers may need to be emulated using floating point hardware. This means that integer operations that go outside the range of integers that can be expressed as floats on these platforms are not guaranteed to function as expected. Additionally, not all target platforms have native support for half or double values. If the target platform does not, these will be emulated using float.
Vector Types
You will often find yourself declaring vector variables in your HLSL shaders. There are a variety of ways that these vectors can be declared, including the following:
Table 5. Vector types
vector |
A vector of dimension 4; each component is of type float. |
vector < type, size > |
A vector of dimension size; each component is of scalar type type. |
The most common way that you will see shader authors declare vectors, however, is by using the name of a type followed by an integer from 2 to 4. To declare a 4-tuple of floats, for example, you could use any of the following vector declarations:
To declare a 3-tuple of bool
s, for example, you could use any of the following declarations:
Once you have defined a vector, you may access its individual components by using the array access syntax or using a swizzle. In the swizzle case, the components must come from either the {
x. y, z, w}
or {
r, g, b, a}
name-space (but not both). For example:
float4 pos = {3.0f, 5.0f, 2.0f, 1.0f}; float value0 = pos[0]; // value0 is 3.0f float value1 = pos.x; // value1 is 3.0f float value2 = pos.g; // value2 is 5.0f float2 vec0 = pos.xy; // vec0 is {3.0f, 5.0f} float2 vec1 = pos.ry; // INVALID because of bad swizzle
It should be noted that the ps_2_0 and lower pixel shader models do not have native support for arbitrary swizzles. Hence, concise high-level code which uses swizzles can result in fairly nasty binary asm when compiling to these targets. You should familiarize yourself with the native swizzles available in these assembly models.
Matrix Types
Another very common type of variable you will find yourself using in HLSL shaders is matrices, which are 2D arrays of data. Like scalars and vectors, matrices may be composed of any of the basic datatypes: bool, int, half, float or double. Matrices may be of any size, but you will typically find shader writers using matrices with up to 4 rows and columns. You will recall that the example vertex shader shown at the beginning of the chapter declared two 4 × 4 float matrices at global scope:
Naturally, other dimensions of matrices can be used. For example, we could declare a floating-point matrix with 3 rows and 4 columns in a variety of ways:
Like vectors, the individual elements of matrices can be accessed using array or structure/swizzle syntax. For example, the following array indexing syntax can be used to access the top-left element of the matrix view_proj_matrix
:
There is also a structure syntax defined for access to and swizzling of matrix elements. For zero-based row-column position, you can use any of the following: