C# GPU 初探

最新推荐文章于 2024-10-11 07:04:43 发布

悟無生

最新推荐文章于 2024-10-11 07:04:43 发布

阅读量1.1w

点赞数 2

分类专栏： GPU/Cuda

GPU/Cuda 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

初探 C# GPU 通用计算技术

首先感谢未经允许的转载！

GPU 的并行计算能力高于 CPU，所以最近也有很多利用 GPU 的项目出现在我们的视野中，在 InfoQ 上看到这篇介绍 Accelerator-V2 的文章，它是微软研究院的研究项目，需要注册后才能下载，感觉作为我接触 GPU 通用运算的第一步还不错，于是去下载了回来。

在安装包里，包含了几个例子程序，比如著名的 Life 游戏，不过，Life 游戏，相对于刚接触 GPU 运算的我，还是稍显复杂了。于是简化一下，只是进行一些简单的计算，发现，DX9Target.ToArray 如果返回参数是 int 数组的话，则会爆出“未支持的操作”的异常，想想也对，显卡确实是精于浮点运算的。

本来，我以为，GPU 运算是 DirectX 11 才有的功能，但是 Accelerator 支持的却是 DirectX 9，想来 DirectX 11 支持的运算能力更高、方式更简单吧。

为了简单比较一下 CPU 和 GPU 的速度，也写了一个 .net 4 的并行运算的程序，因为 DX9Target 不支持 int，所以这里的数组也用 float，如下：

 
private const int GridSize = 1024;
 
private float[] _map;
 
 
 
public Form1()
 
{
 
    InitializeComponent();
 
    _map = new float[GridSize * GridSize];
 
    for (int y = 0; y < GridSize; y++)
 
    {
 
        for (int x = 0; x < GridSize; x++)
 
        {
 
            _map[x * GridSize + y] = x * y;
 
        }
 
    }
 
    Render();
 
}
 
 
 
private void Start_Click(object sender, EventArgs e)
 
{
 
    var stopwatch = new Stopwatch();
 
    stopwatch.Start();
 
    _map = _map.AsParallel().Select(p => p * p * p / 4 + 194).ToArray();
 
    var time = stopwatch.ElapsedMilliseconds;
 
    this.Text = time.ToString();
 
    Render();
 
}
 
 
 
private void Render()
 
{
 
    var workingBitmap = new Bitmap(pictureBox1.Width, pictureBox1.Height);
 
 
 
    for (int y = 0; y < pictureBox1.Height; y++)
 
    {
 
        for (int x = 0; x < pictureBox1.Width; x++)
 
        {
 
            workingBitmap.SetPixel(x, y, Color.FromArgb(-0x1000000 | (int)_map[x * 2 * GridSize + y * 2]));
 
        }
 
    }
 
    pictureBox1.Image = workingBitmap;
 
}

而使用 Accelerator 的代码如下：

 
private const int GridSize = 1024;
 
private readonly DX9Target _target;
 
private float[,] _map;
 
 
 
public Form1()
 
{
 
    InitializeComponent();
 
    _target = new DX9Target();
 
    _map = new float[GridSize, GridSize];
 
    for (int y = 0; y < GridSize; y++)
 
    {
 
        for (int x = 0; x < GridSize; x++)
 
        {
 
            _map[x, y] = x * y;
 
        }
 
    }
 
    Render();
 
}
 
 
 
private void Start_Click(object sender, EventArgs e)
 
{
 
    var stopwatch = new Stopwatch();
 
    stopwatch.Start();
 
 
 
    var p = new FloatParallelArray(_map);
 
    p = p * p * p / 4 + 194;
 
    _target.ToArray(p, out _map);
 
 
 
    var time = stopwatch.ElapsedMilliseconds;
 
    this.Text = time.ToString();
 
    Render();
 
}
 
 
 
private void Render()
 
{
 
    var workingBitmap = new Bitmap(pictureBox1.Width, pictureBox1.Height);
 
 
 
    for (int y = 0; y < pictureBox1.Height; y++)
 
    {
 
        for (int x = 0; x < pictureBox1.Width; x++)
 
        {
 
            workingBitmap.SetPixel(x, y, Color.FromArgb(-0x1000000 | (int)_map[x * 2， y * 2]));
 
        }
 
    }
 
    pictureBox1.Image = workingBitmap;
 
}

用我的笔记本（CPU 为 Core i5 430, 显卡为 ATI 5650）测试，对它们两个程序，都点击几次 Start 按钮，发现运行 3 次左右，图片框会变成全黑，这时，普通并行程序运算速度变慢，而 GPU 程序运行速度无明显变化，普通并行程序 4 次值为：96，89，277，291，而 GPU 程序 4 次值为：71，40，35，50。单就这个测试来说，在我的电脑上，使用 GPU 的程序，大概比普通并行程序快一倍左右吧。这个测试本身，其实不见得很公平，结果仅供参考。

不过，在 Accelerator 中的并行编程，明显感觉受到的约束很大，平常很容易的代码，要改成这种并行模式，需要花费很多力气，有些逻辑甚至无法实现。相对于 Accelerator，Brahma的代码写起来就容易得多，也更易于阅读，其 Life 游戏的例子程序读起来简单而清晰，可惜我编译了 Brahma v0.1 和 v0.4，在我的电脑上，DirectX 的例子程序没有效果，而 OpenGL 的例子程序则会报一个“The generated GLSL was invalid”的异常，看来还需要等它完善之后才能使用吧。

转自：http://llf.hanzify.org/studio/article/show/csharp_gpgpu