Performing Calculations on a GPU
平台:Xcode iOS开发,初级,认识显卡开发
适用:显卡编程初学者,C++ OC 或metal openGL 初步英语能力
内容:Use Metal to find GPUs and perform calculations on them.
Overview
In this sample, you’ll learn essential非常重要的 tasks that are used in all Metal apps.
You’ll see how to convert a simple function written in C to Metal Shading Language (MSL) so that it can be run on a GPU.
You’ll find a GPU, prepare the MSL function to run on it by creating a pipeline, and create data objects accessible to the GPU.
To execute the pipeline against对应 your data, create a command buffer, write commands into it, and commit the buffer to a command queue.
Metal sends the commands to the GPU to be executed.
Write a GPU Function to Perform Calculations
To illustrate举例说明 GPU programming, this app adds corresponding相应的 elements of two arrays together, writing the results to a third array.
Listing 1 shows a function that performs this calculation on the CPU, written in C.
It loops over the index, calculating one value per iteration of the loop.
Listing 1 Array addition, written in C
void add_arrays(const float* inA,
const float* inB,
float* result,
int length)
{
for (int index = 0; index < length ; index++)
{
result[index] = inA[index] + inB[index];
}
}
Each value is calculated independently独立的, so the values can be safely calculated concurrently一致的.
To perform the calculation on the GPU, you need to rewrite this function in Metal Shading Language (MSL).
MSL is a variant变体 of C++ designed for GPU programming.
In Metal, code that runs on GPUs is called a shader着色器, because historically they were first used to calculate colors in 3D graphics.
Listing 2 shows a shader in MSL that performs the same calculation as Listing 1.
The sample project defines this function in the add.metal
file.
Xcode builds all .metal
files in the application target and creates a default Metal library, which it embeds嵌入 in your app.
You’ll see how to load the default library later in this sample.
Listing 2 Array addition, written in MSL
kernel void add_arrays(device const float* inA,
device const float* inB,
device float* result,
uint index [[thread_position_in_grid]])
{
// the for-loop is replaced with a collection of threads, each of which
// calls this function.
result[index] = inA[index] + inB[index];
}
Listing 1 and Listing 2 are similar, but there are some important differences in the MSL version. Take a closer look at Listing 2.
First, the function adds the kernel
keyword, which declares that the function is:
- A public GPU function. Public functions are the only functions that your app can see. Public functions also can’t be called by other shader functions.
- A compute function (also known as a compute kernel), which performs a parallel并行的 calculation using a grid of threads.
See Using a Render Pipeline to Render Primitives to learn the other function keywords used to declare public graphics functions.
The add_arrays
function declares three of its arguments with the device
keyword, which says that these pointers are in the device
address space.
MSL defines several disjoint不连续的 address spaces for memory.
Whenever任何时候 you declare a pointer in MSL, you must supply a keyword to declare its address space.
Use the device
address space to declare persistent持续的 memory that the GPU can read from and write to.
Listing 2 removes the for-loop from Listing 1, because the function is now going to be called by multiple threads in the compute grid.
This sample creates a 1D grid of t