HLSL Introduction
High Level Shading Language (HLSL), a programming language for Graphic Processing Unit (GPU) in DirectX 9/10/11, supports the shader construction with C-like syntax, types, expressions, statements, and functions.
Long time ago, Apple's RenderMan was a popular shading language that was used to generate cinematic effects with CPU in render farms. Lately, Microsoft's High-Level Shading Languages (HLSL) and OpenGL's OpenGL Shading Language (GLSL) shading languages have been developed for real-time shader on GPU. Best integrated into the DirectX 9 and plus, HLSL works solely on Windows platform. Similarly, OpenGL 1.5 to 4.0 starts to include OpenGL shading language GLSL as a standard component. These high level languages accelerated the shader development.
To build a complete shader, new shading languages for GPU must work along with a host programming language such as C/C++, although it is tedious to set large amount of parameters, C/C++ is the fastest on CPU. Other languages can also be used to build shaders.
CUDA ("Compute Unified Device Architecture") is a C language extension for GPU programming. Unlike HLSL it removed many 3D components of a GPU language. CUDA had been developed by Nvidia and worked with GeForce 8 or later series. HLSL 5.0 for DirectX 11 is going to add new GPGPU functions like CUDA that also works for AMD's and Intel's GPU. Lately new OpenCL starts to replace CUDA as a multi-platform GPU language.
struct a2v { float4 Position : POSITION; };
vs_1_1 dcl_position v0 m4x4 oPos, v0, c0 mov oD0, c4
This asm program is the compiled code of the simplest HLSL example. First, vs_1_1 specifies the version of vertex shader as 1.1. Second, dcl_position v0 declares that v0 is a position register. The third statement declares a matrix multiply with source variable register v0 and constant register c0 where oPos represents the destination position register. Finally, the last statement moves the value of register c4 to register oD0. Usually, vN represents input register and oXXX represents output register in the assembly vertex shader language.
float4x4 ModelViewProj; float4x4 ModelViewIT; float4 LightVec;
these are global variables.
struct a2v { float4 Position : POSITION; float4 Normal : NORMAL; };
Normal item is added for color computing.
struct v2p { float4 Position : POSITION; float4 Color : COLOR0; };
color outputs to pixel shader.
void main( in a2v IN, out v2p OUT ) {
input parameters include view project matrix ModelViewProj, view inverse transpose matrix ModelViewIT, and light vector LightVec.
OUT.Position = mul(IN.Position, ModelViewProj);
multiply position with view project matrix
float4 normal = mul(IN.Normal, ModelViewIT); normal.w = 0.0; normal = normalize(normal); float4 light = normalize(LightVec); float4 eye = float4(1.0, 1.0, 1.0, 0.0); float4 vhalf = normalize(light + eye);
transform normal from model-space to view-space, store normalized light vector, and calculate half angle vector. float4(1.0, 1.0, 1.0, 0.0) is a vector constructor to initialize vector float4 eye.
.xyzz, a swizzle operator, sets the last component w as the z value.
float diffuse = dot(normal, light); float specular = dot(normal, vhalf); specular = pow(specular, 32);
calculate diffuse and specular components with dot product and pow function.
float4 diffuseMaterial = float4(0.5, 0.5, 1.0, 1.0); float4 specularMaterial = float4(0.5, 0.5, 1.0, 1.0);
set diffuse and specular material.
OUT.Color = diffuse*diffuseMaterial + specular*specularMaterial; }
add diffuse and specular components and output final vertex color.
To better understand a swizzle operator, readers can compare cross definition
float3 cross( float3 a, float3 b ) { return float3( a.y*b.z - a.z*b.y, a.z*b.x - a.x*b.z, a.z*b.y - a.y*b.z ); }
and its swizzle implementation
float3 cross( float3 a, float3 b ) { return a.yzx*b.zxy - a.zxy*b.yzx; }
Draw Texture
Now we are going to show how to add texture on a surface
float4x4 ModelViewProj; float4x4 ModelViewIT; float4 LightVec;
global variables
struct a2v { float4 Position : POSITION; float2 Texcoord : TEXCOORD0; float4 Normal : NORMAL; };
add new Texcoord component as texture coordinate with TEXCOORD0.
struct v2p { float4 Position : POSITION; float2 Texcoord : TEXCOORD0; float4 Color : COLOR0; };
add Texcoord in output
void main( in a2v IN, out v2p OUT ) { OUT.Position = mul(IN.Position, ModelViewProj); ... same as the above example
set texture coord
OUT.Texcoord = IN.Texcoord; }
the code is the same as the above example except that the texture coord copy is added.
Pixel Shader
Pixel shader completes the computing of pixels.
float brightness; sampler2D tex0; sampler2D tex1;
brightness is the value to control the bright of light. Both tex0 and tex1 are the sampler of the textures.
struct v2p { float4 Position : POSITION; float2 Texcoord0 : TEXCOORD0; float2 Texcoord1 : TEXCOORD1; float4 Color : COLOR0; };
v2p declares a struct type that transfers data from vertex shader to pixel shader. It is the same as the struct in vertex shader. The input semantics of pixel shader can beCOLORn for Color orTEXCOORDn for Texture coordinates. Although the struct v2p must be the same as the v2p in the vertex shader, the item Position can not be read in the pixel shader, because it is not binded by the input semantics of the pixel shader.
struct p2f { float4 Color : COLOR0; };
p2f declares output data structure and OUT is the output object. The output semantics of pixel shader can beCOLORn of Color for render target n and/orDEPTH for Depth value.
void main( in v2p IN, out p2f OUT ) {
Constant parameter brightness has a float type. sampler2D specifies a 2D texture unit. When you plan to access a texture you must usesampler with an intrinsic function. A sampler can be used for multiple times.
float4 color = tex2D(tex0, IN.Texcoord0); float4 bump = tex2D(tex1, IN.Texcoord1);
fetch texture color and bump coordinate for further computing of bump effect. tex2D is an texture sampling intrinsic function of HLSL. It generates a vector from a texture sampler and a texture coordinate.
OUT.Color = brightness * IN.Color * color; }
the code multiples brightness, IN.Color and color to generate output RGBA color vector.
struct a2v { float4 Position : POSITION; float2 Texcoord : TEXCOORD0; float4 Normal : NORMAL; float4 Tangent : TANGENT; };
struct v2p { float4 Position : POSITION; float4 Color : COLOR0; float2 Texcoord0 : TEXCOORD0; float2 Texcoord1 : TEXCOORD1; };
define main function, get position
void main( in a2v IN, out v2p OUT ) { OUT.Position = mul(IN.Position, ModelViewProj);
transform normal from model-space to view-space
float4 tangent = mul(float4(IN.Tangent.xyz,0.0),ModelWorld); float4 normal = mul(float4(IN.Normal.xyz,0.0),ModelWorld); float3 binormal = cross(normal.xyz,tangent.xyz);
Position in World
float4 posWorld = mul(IN.Position, ModelWorld);
get normalize light, eye vector, and half angle vector
float4 light = normalize(posWorld-vLight); float4 eye = normalize(vEye-light); float4 vhalf = normalize(eye-vLight);
transform light and vhalf vectors to tangent space
float3 L = float3(dot(tangent, light), dot(binormal, light.xyz), dot(normal, light)); float3 H = float3(dot(tangent, vhalf), dot(binormal, vhalf.xyz), dot(normal, vhalf));
calculate diffuse and specular components
float diffuse = dot(normal, L); float specular = dot(normal, H); specular = pow(specular, power);
combine diffuse and specular contributions and output final vertex color, set texture coordinates, and return output object.
OUT.Color = 2.0*(diffuse*vDiffuseMaterial + specular*vSpecularMaterial) + 0.5 + vAmbient; OUT.Texcoord0 = IN.Texcoord; OUT.Texcoord1 = IN.Texcoord; }
Image Processing (1): Sobel Edge Filter
This program shows the implementation of Sobel edge filter with a HLSL pixel shader. In the similar way we can implement many image filters in pixel shader.
float Brightness; sampler2D tex0;
global variables
struct v2p { float4 Position : POSITION; float2 Texcoord : TEXCOORD0; float4 Color : COLOR0; }; struct p2f { float4 Color : COLOR0; };
main function includes a vertice struct as input, a float parameter as brightness control, and a 2D texture sampler. The return value has float4 type with the semantic COLOR0.
void main( in v2p IN, out p2f OUT ) {
const specifies the constants. The c[NUM] is a float2 constant array. Notes its initialization is convenience like C language. col[NUM] is a variable array of type float3 with NUM elements. int i declares the i as integer. These usage is effective for pixel shader 2.0 or later.
const int NUM = 9; const float threshold = 0.05; const float2 c[NUM] = { float2(-0.0078125, 0.0078125), float2( 0.00 , 0.0078125), float2( 0.0078125, 0.0078125), float2(-0.0078125, 0.00 ), float2( 0.0, 0.0), float2( 0.0078125, 0.007 ), float2(-0.0078125,-0.0078125), float2( 0.00 , -0.0078125), float2( 0.0078125,-0.0078125), }; float3 col[NUM]; int i;
it stores the samples of texture to col array.
for (i=0; i < NUM; i++) { col[i] = tex2D(tex0, IN.Texcoord.xy + c[i]); }
now we start to compute the luminance with dot product and store them in lum array.
float3 rgb2lum = float3(0.30, 0.59, 0.11); float lum[NUM]; for (i = 0; i < NUM; i++) { lum[i] = dot(col[i].xyz, rgb2lum); }
Sobel filter computes new value at the central position by sum the weighted neighbors.
float x = lum[2]+ lum[8]+2*lum[5]-lum[0]-2*lum[3]-lum[6]; float y = lum[6]+2*lum[7]+ lum[8]-lum[0]-2*lum[1]-lum[2];
show the points which values are over the threshold and hide others. Final result is the product of col[5] and edge detector value. Brightness adjusts the brightness of the image.
float edge =(x*x + y*y < threshold)? 1.0:0.0;
final output
OUT.xyz = Brightness * col[5].xyz * edge.xxx; OUT.w = 1.0; }
Image Processing (2): Mask
A well known example of three media streams is the mask transform. A mask is a black and white image. Two media streams and a mask picture are the inputs. The effect is the sum of the first stream multiples the mask and the second stream multiples the negative of mask. Therefore, two streams are showed in the areas distinguished by the shape of the mask.
sampler2D tex[3]; struct v2p { float4 Position : POSITION; float4 Color : COLOR0; float2 Texcoord0 : TEXCOORD0; float2 Texcoord1 : TEXCOORD1; float2 Texcoord2 : TEXCOORD2; };
These are the texture samplers and struct
void main(in v2p IN, out float4 OUT : COLOR) { float Brightness = 1.2345; float alpha = 0.96; float3 color0 = tex2D(tex[0], IN.Texcoord0); float3 color1 = tex2D(tex[1], IN.Texcoord1); float3 mask = tex2D(tex[2], IN.Texcoord2);
The HLSL sample code is quite simple for the formula.
OUT.rgb = Brightness *(color0 * mask + color1 * (1.0-mask)); OUT.a = 1.0; }
Shader Model 3.0
There are many posts about Shader Model 3.0. Following HLSL code can not be compiled with Shader Model 2.0 but Shader Model 3.0 should. Read the error message that will show a difference between SM2 and SM3.
sampler2D tex0; sampler2D tex1;
texture sampler
struct v2p { float4 Position : POSITION; float4 Color : COLOR0; float2 Texcoord0 : TEXCOORD0; float2 Texcoord1 : TEXCOORD1; };
vertex shader input
void main(in v2p IN, out float4 OUT : COLOR) {
pixel shader main function.
float Brightness = 1.1; int i, j; float2 tc = IN.Texcoord0; float b[5]; float2 s[5]; float3 col[5];
define variables and arrays.
float3 rgb2lum = float3(0.30f, 0.59f, 0.11f); s[0] = float2( 0.0f, 0.0f ); s[1] = float2( 0.0f, 0.0078125f ); s[2] = float2( 0.0078125f, 0.0f ); s[3] = -s[1]; s[4] = -s[2];
set convertible constants
for (i=0; i<5; i++) { col[i] = tex2D(tex0, tc + s[i]); b[i] = dot(col[i].xyz, rgb2lum); } float flag[4]; float r; for (i=0; i<4; i++) { r = -1.0f; for (j=0; j<5; j++) { r += step(b[i], b[j]); } flag[i] = r; }
compute the values of flag array.
OUT.xyz = Brightness * (( flag[0] == 2.0 ) ? col[0] : (( flag[1] == 2.0 ) ? col[1] : (( flag[2] == 2.0 ) ? col[2] : (( flag[3] == 2.0 ) ? col[3] : col[4] )))); OUT.w = 1.0; }
Final output
Video Mixing (1): Vertex Shader
Today the powerful GPU and multithread engine have made real-time video mixing on GPU possible. Below we describe the default vertex shader of the Ladybug Mixer Vn.
float4x4 WorldViewProj; float4x4 WorldViewIT; float4 LightPos; float4 EyePos; float fTime;
WorldViewProj is the transform matrix from model space to clip space. WorldViewIT is the transform matrix from model space to view space. LightPos is the light position in model space. EyePos is the eye position in model space. These global variables are reserved for later using.
fTime is the application time that can be used to control effects in timeline.
#define NUMTEXCOORD 4 struct a2v { float3 Position : POSITION; float Diffuse : COLOR; float2 TexCoord[NUMTEXCOORD] : TEXCOORD0; };
This is the input structure from application to the vertex program (a2v). Position is the position in model space. Diffuse specifies the diffuse color. TexCoord specifies texture coordinates where video streams are represented as textures.
The number of textures NUMTEXCOORD is dependent on the graphic hardware. Four is the minimum support and some cards can sample upto 16 textures.
struct v2p { float4 Position : POSITION; float4 Color : COLOR0; float2 Texcoord[NUMTEXCOORD] : TEXCOORD0; };
This is the structure from vertex shader to pixel shader (v2p).
void main(a2v IN, out v2p OUT) { OUT = (v2p) 0; OUT.Position = mul(float4(IN.Position, 1.0f), WorldViewProj); for (int i=0; i < NUMTEXCOORD; i++) { OUT.Texcoord[i] = IN.TexCoord[i]; } }
The main program transforms the position from model space to clip space. Then copy the texture coordintates to output.
Video Mixing (2): Pixel Shader
Basic wipe transform is like to overlay two paper slips. The overlapped area of two pictures in the middle is mixed with alpha values by positions.
Ring wipe defines a ring in the center of the first picture. The area between the inner circle and the outer circle is the mixing of two pictures by alpha values that is decided by the distance to the center. The area inside the inner circle is belong to the first picture. And the area outside the outer circle is belong to the second picture.
sampler2D tex[2]; struct v2p { float4 Position : POSITION; float4 Color : COLOR0; float2 Texcoord[2] : TEXCOORD0; };
sampler2D defines texture array that represents the video streams. v2p is the struct of stream from vertex shader to pixel shader.
void main(in v2p IN, out float4 OUT : COLOR) { float Brightness = 1.2343; float len = 0.3; float a = 0.2; float b = a+len; float2 center = float2(0.5, 0.5); float3 result;
Brightness is the variable for brightness of light. a and b are the radius of inner and outer circle respectively. len is the width of the ring. And thecenter is the centric coordinate of the ring.result represents the result of the value of effect.
float3 color0 = tex2D(tex[0], IN.Texcoord[0]); float3 color1 = tex2D(tex[1], IN.Texcoord[1]); float2 point = IN.Texcoord[0];
color0 and color1 are sampled color value from the first picture and the second picture respectively.point is the current coordinate value.
float dist = distance( point, center ); if ( dist < a) { result = color0; } else if (dist > b) { result = color1; } else { result = lerp(color0, color1, saturate((dist-a) / len)); }
This is the kernel of the algorithm. distance is the build-in HLSL function. saturate function converts dist value from [a, b] to [0, 1].lerp is the build-in function for linear interpolation.
OUT.rgb = Brightness * result; OUT.a = 1.0; }
OUT is the last result.
Ring wipe transform is one of the most interesting transforms because the position and size of a ring can be variant. Furthermore, many ring wipes can be applied on different center of a shader. At this stage the SM3.0 will be very helpful. Who has said that software was always lagged hardware?