Metal入门学习：GPU并行计算大数组相加

qqwyuli

已于 2023-05-21 13:02:52 修改

阅读量941

点赞数

文章标签：学习 Metal 入门学习 GPU并行着色函数 MSL语言

于 2023-05-20 21:10:21 首次发布

本文链接：https://blog.csdn.net/qqwyuli/article/details/130785820

版权

一、编程指南PDF下载链接(中英文档）

1、Metal编程指南PDF链接
https://github.com/dennie-lee/ios_tech_record/raw/main/Metal学习PDF/Metal 编程指南.pdf
2、Metal着色语言(Metal Shader Language:简称MSL)编程指南PDF链接
https://github.com/dennie-lee/ios_tech_record/raw/main/Metal学习PDF/Metal 着色语言指南.pdf
3、补充：官网API文档链接
https://developer.apple.com/documentation/metal/performing_calculations_on_a_gpu

二、内容前述

本文章通过元素个数相同的两个数组对应位置相加来切入Metal（GPU）的并行计算着色函数（本篇文章未涉及渲染函数：顶点着色函数和片元着色函数，会另起一篇文章介绍https://blog.csdn.net/qqwyuli/article/details/130787337）。在此示例中，可以了解所有 Metal 应用程序中使用的基本任务。您将看到如何将用 C 编写的简单函数转换为Metal着色语言 (Metal Shader Language ：MSL)，以便它可以在 GPU 上运行。通过创建管道，准备MSL函数在其上运行，并创建GPU可访问的数据对象。要针对您的数据执行管道，创建命令缓冲区，将命令写入其中，然后将缓冲区提交到命令队列，Metal将命令发送到GPU执行。
（下图是官网并行计算着色函数流程原理图：）
在这里插入图片描述

三、C语言和MSL语言对两个数组相加的函数对比

1、C语言函数：两个数组相加

void add_arrays(const float* inA,
                const float* inB,
                float* result,
                int length){
    for (int index = 0; index < length ; index++)
    {
        result[index] = inA[index] + inB[index];
    }
}

2、MSL语言函数：两个数组相加

kernel void add_arrays(device const float* inA,
                       device const float* inB,
                       device float* result,
                       uint index [[thread_position_in_grid]]){
    result[index] = inA[index] + inB[index];
}

关键词解释说明（在MSL编程指南PDF中可查看，点击上面的链接下载即可）
在这里插入图片描述
[[thread_position_in_grid]]文档中也有，更加详细的解析说明可查看这篇文章：https://juejin.cn/post/7085633906501746724，文章中还有延伸说明解释[[threadgroup_position_in_grid]]和[[threads_per_threadgroup]]

四、关键代码段解析

1、初始化、加载扩展名为.metal的文件、创建管道状态对象加载并行计算着色函数

//_mDevice = MTLCreateSystemDefaultDevice();
// Load the shader files with a .metal file extension in the project
id<MTLLibrary> newDefaultLibrary = [_mDevice newDefaultLibrary];
if (newDefaultLibrary == nil){
    NSLog(@"Failed to find the default library");
    return nil;
}
    
id<MTLFunction> newFunction = [newDefaultLibrary newFunctionWithName:@"add_arrays"];
if (newFunction == nil){
    NSLog(@"Failed to find the adder function");
    return nil;
}
    
NSError *error;
// Create a compute pipeline state object.
//根据扩展名为.metal文件中kernel定义的函数创建计算管道（项目Add文件）
_mAddFunctionPSO = [_mDevice newComputePipelineStateWithFunction:newFunction error:&error];
if (_mAddFunctionPSO == nil){
    //  If the Metal API validation is enabled, you can find out more information about what
    //  went wrong.  (Metal API validation is enabled by default when a debug build is run
    //  from Xcode)
    NSLog(@"Failed to create pipeline state object,error : %@)",error);
    return nil;
}

// 指令队列
_mCommandQueue = [_mDevice newCommandQueue];
if (_mCommandQueue == nil){
    NSLog(@"Failed to find command queue");
    return nil;
}

2、初始化数组数据

//初始化数组数据
- (void)prepareData{
    _mBufferA = [_mDevice newBufferWithLength:bufferSize options:MTLResourceStorageModeShared];
    _mBufferB = [_mDevice newBufferWithLength:bufferSize options:MTLResourceStorageModeShared];
    _mBufferResult = [_mDevice newBufferWithLength:bufferSize options:MTLResourceStorageModeShared];
    [self generateRandomFloatData:_mBufferA];
    [self generateRandomFloatData:_mBufferB];
}

- (void)generateRandomFloatData:(id<MTLBuffer>)buffer{
    float *dataPtr = buffer.contents;
    for (int index = 0; index < arrayLength; index++){
        dataPtr[index] = (float)rand() / (float)RAND_MAX;
    }
}

3、指令参数添加，提交GPU计算

- (void)sendComputeCommand{
    //create a command buffer to hold commands
    //创建指令缓存冲区
    id<MTLCommandBuffer> commandBuffer = [_mCommandQueue commandBuffer];
    assert(commandBuffer != nil);
    
    //开始进行指令添加参数
    id<MTLComputeCommandEncoder> computeEncoder = [commandBuffer computeCommandEncoder];
    assert(computeEncoder != nil);
    
    NSTimeInterval startTimeInterval = [[[NSDate alloc] init] timeIntervalSince1970];
    [self encodeAdderCommand:computeEncoder];
    
    //添加参数完毕，
    [computeEncoder endEncoding];
    //提交执行指令
    [commandBuffer commit];
    //等待计算完毕
    [commandBuffer waitUntilCompleted];
    NSTimeInterval endTimeInterval = [[[NSDate alloc] init] timeIntervalSince1970];
    NSLog(@"长度为：%u 的两个数组相加（Metal方式）花费的时间：%f",arrayLength,(endTimeInterval - startTimeInterval));
    //验证计算结果
    [self verifyResults];
}

- (void)encodeAdderCommand:(id<MTLComputeCommandEncoder>)computeEncoder{
    //encode pipeline state object and its parameters
    [computeEncoder setComputePipelineState:_mAddFunctionPSO];
    //此处的atIndex为0，与MSL函数中参数对应顺序对应（也可以设置buffer(0)，此处可先忽略buffer，详细信息可查看着色语言编程指南PDF）
    [computeEncoder setBuffer:_mBufferA offset:0 atIndex:0];
    //此处的atIndex为1，与MSL函数中参数对应顺序对应（也可以设置buffer(1)，此处可先忽略buffer，详细信息可查看着色语言编程指南PDF）
    [computeEncoder setBuffer:_mBufferB offset:0 atIndex:1];
    //此处的atIndex为1，与MSL函数中参数对应顺序对应（也可以设置buffer(2)，此处可先忽略buffer，详细信息可查看着色语言编程指南PDF）
    [computeEncoder setBuffer:_mBufferResult offset:0 atIndex:2];
    
    //全部线程数量，此处对应[[thread_position_in_grid]]，理解此处请看第三.2对应关键词解释的文章
    MTLSize gridSize = MTLSizeMake(arrayLength, 1, 1);
    NSUInteger threadGroupSize = _mAddFunctionPSO.maxTotalThreadsPerThreadgroup;
    if (threadGroupSize > arrayLength){
        threadGroupSize = arrayLength;
    }
    //线程组大小
    MTLSize threadGroupsize = MTLSizeMake(threadGroupSize, 1, 1);
    [computeEncoder dispatchThreads:gridSize threadsPerThreadgroup:threadGroupsize];
}