Uniform Buffer Objects (UBO) using the std140 layout specification

If we have information we need to set for multiple programs, we can either set the uniform each time we use a new program :

// Global Variables
int programID;
int uniformLocation;
 
// Done after successful program linking
uniformLocation = GL.GetUniformLocation( programID, uniformName ); // Gets the uniform variable Location
 
// Done at render stage (GLControl Paint Event / GameWindow OnRenderFrame Event / Wherever your rendering is done using FBO, etc.)
GL.UseProgram( programID ); // Sets the current shader program
GL.Uniform4( uniformLocation, ref uniformVariable ); // Sets the uniform value for the programs use, in this case a Vector4

or we could set the information into a UBO and direct the shader programs to where it is, and use that.

The advantage here is if we have a lot of information (eg. List of lights/Materials, etc.) the amount of calls needed to set this on a per program level can become enormous, and generate a heavy amount of undesired overhead. One solution is to use Uniform Buffer Objects, which are set one per frame or once per load depending on the use.

Here I will only cover the layout std140 specification defined in the OpenGL 3.3 Specification (Section 2.11.4, Pg71).

std140 specifies a layout which is implementation independent, the other layouts are implementation dependant and requires gathering information and formatting your buffers accordingly, however to get started std140 will do fine (Note: std140 defines a specific way to layout the buffer, it may not necessarily be the best or most optimized way to use the buffer)

Discussion of the std140 layout:

According to the specification (Linked Above) the Block Alignment is set at 4N, where N = Basic Data Type. Basic Data Types all fit into a single DWORD, and according to the specification they are bool, float, int, uint. so in essence it will align to 4(bool|float|int|uint).

The shader variable alignment is as follows: (I'll only cover the basic floats here, but the principle applies all round)
vec4 - 4N
vec3 - 4N
vec2 - 2N
float - N

Best way to explain this is with a picture :)

If we have a Data Block of 8N

NNNNNNNN

The layout states, everything will work with the alignment of 4N, so we get this

NNNN
NNNN

Simply put, we have chunks of 4N to work with, it is wise to fit our data into those chunks, anything that goes over a chunk boundry, will be placed into a new chunk.

eg.
If we have floats values for N we can have:

1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f

which will break into chunks like this

1.0f, 2.0f, 3.0f, 4.0f, 
5.0f, 6.0f, 7.0f, 8.0f

Now to see how this fits into variable, I can have a float array in C#

float[] UBOData = { 1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f };

and have a uniform block in my shader defined as

layout(std140) uniform UBOData {
	float value1;
	float value2;
	float value3;
	float value4;
	float value5;
	float value6;
	float value7;
	float value8;
};

This will place the array of data into the relevent slot sequentially as expected, however this is most certainly not very useful, so we use some variable proper, like so:

layout(std140) uniform UBOData {
	vec4 firstHalfValues;
	vec4 secondHalfValues;
};

More useful yes, but how do the values look in the vectors?

firstHalfValues looks like this (1.0f, 2.0f, 3.0f, 4.0f)
and
secondHalfValues looks like this (5.0f, 6.0f, 7.0f, 8.0f)

So where does the alignment and boundries come into play, well if we change the uniform block to this:

layout(std140) uniform UBOData {
	vec3 firstValue;
	vec4 secondValue;
	float thirdValue;
};

Now looking at this, we are still defining 8 floats here (vec3 = 3, vec4 = 4, float = 1), One might expect the result to be this:

firstValue = (1.0f, 2.0f, 3.0f)
secondValue = (4.0f, 5.0f, 6.0f, 7.0f)
thirdValue = 8.0f

But it is not so, the actual values end up as such:

firstValue = (1.0f, 2.0f, 3.0f)
secondValue = (5.0f, 6.0f, 7.0f, 8.0f)
thirdValue = 0.0f

The first variable is correct as expected, however, the second and third are not, the reson for this is the alignment of 4N as in the spec, if the next defined variable in a block cannot fit within the size of the remainder of the chunk then the values are aligned with the next chunk.

To show the calculation it goes something like this:

Start of block
firstValue has a size of 3 floats, chunk has 4 floats available, so there is 1 remainder in the chunk
secondValue has a size of 4 floats, chunk has 1 float available, so skip the remainder and start at the next chunk
thirdValue has a size of 1 float, chunk has 0 float available, so move to the begining of the next chunk
End of Block

As can be seem here, the total chunks used are 3, 1 for each variable, looking here we can correct the input array by padding it where it is expected to skip, like so:

float[] UBOData = { 1.0f, 2.0f, 3.0f, 0.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f };

this will give the values as we expect them like this:

firstValue = (1.0f, 2.0f, 3.0f)
secondValue = (4.0f, 5.0f, 6.0f, 7.0f)
thirdValue = 8.0f

Alternately we could change the array as follows

float[] UBOData = { 1.0f, 2.0f, 3.0f, 8.0f, 4.0f, 5.0f, 6.0f, 7.0f };

and change the shaders Block definition to this:

layout(std140) uniform UBOData {
	vec3 firstValue;
	float thirdValue;
	vec4 secondValue;
};

Here there is no padding done but changing the order of the Data and the Block variables, we still get the desired result, Data Calculation for this is as follows:

Start of block
firstValue has a size of 3 floats, chunk has 4 floats available, so there is 1 remainder in the chunk
thirdValue has a size of 1 float, chunk has 1 float available, so fill the variable
secondValue has a size of 4 floats, chunk has 0 float available, so move to the begining the next chunk
End of Block

As can be seen, this is now only using 2 chunks according to the rules.

A more structured approach

When filling the uniform blocks, it is a lot more useful to use a approach which does not include float arrays as input data, so we can use a struct in C# to define our data in a friendlier manner, and we can then match that struch in the shader.

For example our shader defines:

layout(std140) uniform UBOData {
	vec3 firstValue;
	float thirdValue;
	vec4 secondValue;
};

and our C# struct will look like this:

[Serializable]
[StructLayout(LayoutKind.Sequential)]
struct UBOData {
    public Vector3 firstValue;
    public float thirdValue;
    public Vector4 secondValue;
};

This will allow us to load the UBO with the struct, and we know it will match correctly, now we can also change the information in the C# struct and the uniform to mismatch, but still work together.
For example, if we have a Light Structure, where the uniform will be expected to have the light position/direction in the first 3 positions of a vec4, the forth position is 0 for a directional light or 1 for a point light, with a second vec4 as the intensity setting.
This definition in the shader will look like this:

layout(std140) uniform Light {
	vec4 dirPosType;
	vec4 intensity;
};

and the matching C# struct would look like this:

[Serializable]
[StructLayout(LayoutKind.Sequential)]
struct Light {
    public Vector4 dirPosType;
    public Vector4 intensity;
};

This is all good and well as the structure matches correctly, however in the dev environment, you will need to recall whats what in the dirPosType variable in C#, we could change the structure to look like this, and still keep in line with what the shader expects:

[Serializable]
[StructLayout(LayoutKind.Sequential)]
struct Light {
    public Vector3 dirPos;
    public float type;
    public Vector4 intensity;
};

This makes it a bit more readable within the c# code.

Now off to some code :)

To get a uniform block going a few steps need to be completed to do it, namely:

Create and setup a Uniform Buffer Object
Bind the Uniform Buffer Object to a Buffer Index
Bind the Program Uniform Block to the Buffer Index

There are 3 locations which are used here

UBO Location given by OpenGL when u generate one
Uniform Block Location given by OpenGL when Queried from a succesfully Link Shader program
A USER supplied Binding Buffer Index

Now the Buffer Index if a number you pick [0, maxUniformIndex), the maxUniformIndex can be retrieved from OpenGL when a valid context exists with the following command:

int maxUniformIndex;
GL.GetInteger(GetPName.MaxUniformBufferBindings, out maxUniformIndex);

The maximum is very implementation dependant, between my machines I have values of 24, 36 and 72.

Setting up a Buffer is done like this in the initialization of your code after the context exists:

// Global Variables
int BufferUBO; // Location for the UBO given by OpenGL
int BufferIndex = 0; // Index to use for the buffer binding (All good things start at 0 )
int UniformBlockLocation; // Uniform Block Location in the program given by OpenGL
 
Light UBOData;
 
void InitializeUniformBuffer() {
	GL.GenBuffers(1, out BufferUBO); // Generate the buffer
	GL.BindBuffer(BufferTarget.UniformBuffer, BufferUBO); // Bind the buffer for writing
	GL.BufferData(BufferTarget.UniformBuffer, (IntPtr)(sizeof(float) * 8), (IntPtr)(null), BufferUsageHint.DynamicDraw); // Request the memory to be allocated
 
	GL.BindBufferRange(BufferTarget.UniformBuffer, BufferIndex, BufferUBO, (IntPtr)0, (IntPtr)(sizeof(float) * 8)); // Bind the created Uniform Buffer to the Buffer Index
}

Note: In the above code teh BufferUsageHint is DynamicDraw, which means we are planning to update the Data occasionally, if you plan to update the data every frame I would suggest changing the Hint to StreamDraw
Next, we link the Buffer Index to the Uniform Block of the shader program, this is done only once for each program, usually after creation:

UniformBlockLocation = GL.GetUniformBlockIndex(programID, "Light");
GL.UniformBlockBinding(programID, UniformBlockLocation, BufferIndex);

And then whenever we want to load the uniform blocks data we can fill it by calling a function like this:

void FillUniformBuffer() {
	GL.BindBuffer(BufferTarget.UniformBuffer, BufferUBO);
	GL.BufferSubData(BufferTarget.UniformBuffer, (IntPtr)0, (IntPtr)(sizeof(float) * 8), ref UBOData);
	GL.BindBuffer(BufferTarget.UniformBuffer, 0);
}

Admittidly this is not a particularly useful example, however a more useful implementation would be for a array of lights, where we would have a list of lights, which updating a list of 3 or 4 lights to 20 programs would be time consuming if not using a UBO.

to create a array of lights, the shader changes slightly to the following:

const int Light_Count = 4;
 
struct LightInformation {
	vec4 dirPosType;
	vec4 intensity;
};
 
layout(std140) uniform Light {
	LightInformation Lights[Light_Count];
};

As you can see, a struct definition in the shader is simular to our C#.

Our C# Struct remains the same, but the variable changes to this:

Light[] UBOData = new Light[4];

And all the buffer Creation and and filling Sizes change to this:

sizeof(float) * 8 * UBOData.Length

Which is the Size of the Data, multiplied by the number of elements. So our code will change to this:

// Global Variables
int BufferUBO; // Location for the UBO given by OpenGL
int BufferIndex = 0; // Index to use for the buffer binding (All good things start at 0 )
int UniformBlockLocation; // Uniform Block Location in the program given by OpenGL
 
Light[] UBOData = new Light[4];
 
void InitializeUniformBuffer() {
	GL.GenBuffers(1, out BufferUBO); // Generate the buffer
	GL.BindBuffer(BufferTarget.UniformBuffer, BufferUBO); // Bind the buffer for writing
	GL.BufferData(BufferTarget.UniformBuffer, (IntPtr)(sizeof(float) * 8 * UBOData.Length), (IntPtr)(null), BufferUsageHint.DynamicDraw); // Request the memory to be allocated
 
	GL.BindBufferRange(BufferTarget.UniformBuffer, BufferIndex, BufferUBO, (IntPtr)0, (IntPtr)(sizeof(float) * 8 * UBOData.Length)); // Bind the created Uniform Buffer to the Buffer Index
}
 
void FillUniformBuffer() {
	GL.BindBuffer(BufferTarget.UniformBuffer, BufferUBO);
	GL.BufferSubData(BufferTarget.UniformBuffer, (IntPtr)0, (IntPtr)(sizeof(float) * 8 * UBOData.Length), UBOData);
	GL.BindBuffer(BufferTarget.UniformBuffer, 0);
}

A Point to note will be the passing of our UBOData variable, in the first example it was passed as a ref, when it is a array, it can no longer be passed as a ref.

In the shader to Access a particular Lights information would be done like this now(To get intensity of the second light):

Lights[1].intensity; // As always a Zero based Index, 1 is the Second Light in the array

If any new information is added to the struct, when it is an array, please bear in mind the Alignment of 4N, as the entire array can become useless if this rule is not obeyed.

For Example adding a light range in the shader:

struct LightInformation {
	vec4 dirPosType;
	vec4 intensity;
	float maxRange;
};

And in C#:

struct Light {
    public Vector3 dirPos;
    public float type;
    public Vector4 intensity;
    public float maxRange;
};

This is going to throw all of the arrays values out of sync, at index 0 the information will be correct, but the rest are doomed, this is due to the rules being applied to the shaders struct, and not to the C# struct.
To correct this we apply the rules, so to recap the shader will align the next element in the array to the base alignment hence

dirPosType is 4 floats of the first chunk
intensity is 4 floats of the second chunk
maxRange is one float of the third chunk leaving 3 remainder which OpenGL will skip and leave unused

For a total of 3 chunks of 4 floats ( total used space of 12 floats).

The C# struct has this

dirPos is 3 floats of the first chunk
type is 1 float for the remainder of the first chunk
intensity is 4 floats of the second chunk
maxRange is 1 float of the third chunk

With the serialization of the UBOData variable, this means that the next array element dirPos, will be filled into the last 3 floats of the third chunk. And this is not according to the rules, to correct this, we need to add the appropriate padding, like so:

struct Light {
    public Vector3 dirPos;
    public float type;
    public Vector4 intensity;
    public float maxRange;
    public float padTheSecondFloatOfTheThirdChunk;
    public float padTheThirdFloatOfTheThirdChunk;
    public float padTheFourthFloatOfTheThirdChunk;
};