摩尔线程 国产显卡 MUSA 并行编程 学习笔记-2025/01/17

 Learning Roadmap:

Section 1: Intro to Parallel Programming & MUSA

  1. Deep Learning Ecosystem(摩尔线程 国产显卡 MUSA 并行编程 学习笔记-2024/11/30-CSDN博客
  2. Ubuntu+Driver+Toolkit+conda+pytorch+torch_musa环境安装(2024/11/24-Ubuntu Windows双系统安装 | 2024/11/30-GPU驱动&MUSA Toolkit安装)
  3. C/C++ Review(摩尔线程国产显卡 MUSA 并行编程学习笔记-2024/11/22-CSDN博客
  4. GPU intros(摩尔线程国产显卡 MUSA 并行编程学习笔记-2024/11/25-CSDN博客
  5. GPU硬件架构 (摩尔线程国产显卡 MUSA 并行编程学习笔记-2024/11/26-CSDN博客)
  6. Write First Kernels (Here) (2024/11/27-线程层级 | 2024/11/28-First MUSA Kernel to Count Thread | 2024/12/02-向量相加 | 2024/12/03-向量相加(3D)| 2024/12/04-矩阵相乘)
  7. MUSA API
  8. Faster Matrix Multiplication
  9. Triton
  10. Pytorch Extensions(摩尔线程国产显卡 MUSA 并行编程学习笔记-2024/11/21-CSDN博客
  11. MNIST Multilayer Perceptron

Section 2: Parallel Programming & MUSA in Depth

  1. Analyzing Parallel Program Performance on a Quad-Core CPU
  2. Scheduling Task Graphs on a Multi-Core CPU
  3. A Simple Renderer in MUSA
  4. Optimizing DNN Performance on DNN Accelerator Hardware
  5. llm.c

Ref:摩尔学院 High-Performance Computing with GPUs | Stanford CS149 - Video | Stanford CS149 - Syllabus

Atomic Operation

Ref:  High-Performance Computing with GPUs Chapter 5

下面代码展示了,GPU编程中的原子操作的概念,由于多个线程在同一个显存内进行书写,如果不进行原子化操作会出现脏读脏写的问题。方法incrementCounterNonAtomic执行了非线程安全的加法,方法incrementCounterAtomic则进行了线程安全的相加。结果中我们也可以清除的看到,通过线程安全的Atomic Add我们保证了计算结果的准确性。

代码地址

MUSA PLAY GROUND - Github

代码

#include <musa_runtime.h>
#include <stdio.h>

#define NUM_THREADS 1000
#define NUM_BLOCKS 1000

// Kernel without atomics (incorrect)
__global__ void incrementCounterNonAtomic(int* counter) {
    // not locked
    int old = *counter;
    int new_value = old + 1;
    // not unlocked
    *counter = new_value;
}

// Kernel with atomics (correct)
__global__ void incrementCounterAtomic(int* counter) {
    int a = atomicAdd(counter, 1);
}

int main() {
    int h_counterNonAtomic = 0;
    int h_counterAtomic = 0;
    int *d_counterNonAtomic, *d_counterAtomic;

    // Allocate device memory
    musaMalloc((void**)&d_counterNonAtomic, sizeof(int));
    musaMalloc((void**)&d_counterAtomic, sizeof(int));

    // Copy initial counter values to device
    musaMemcpy(d_counterNonAtomic, &h_counterNonAtomic, sizeof(int), musaMemcpyHostToDevice);
    musaMemcpy(d_counterAtomic, &h_counterAtomic, sizeof(int), musaMemcpyHostToDevice);

    // Launch kernels
    incrementCounterNonAtomic<<<NUM_BLOCKS, NUM_THREADS>>>(d_counterNonAtomic);
    incrementCounterAtomic<<<NUM_BLOCKS, NUM_THREADS>>>(d_counterAtomic);

    // Copy results back to host
    musaMemcpy(&h_counterNonAtomic, d_counterNonAtomic, sizeof(int), musaMemcpyDeviceToHost);
    musaMemcpy(&h_counterAtomic, d_counterAtomic, sizeof(int), musaMemcpyDeviceToHost);

    // Print results
    printf("Non-atomic counter value: %d\n", h_counterNonAtomic);
    printf("Atomic counter value: %d\n", h_counterAtomic);

    // Free device memory
    musaFree(d_counterNonAtomic);
    musaFree(d_counterAtomic);

    return 0;
}

编译

    mcc 00_atomicAdd.mu -o atomicAdd -mtgpu -O2 -lmusart

   ./atomicAdd

输出结果

如图所示,原子相加保证了结果的正确性

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值