更简单的CUDA入门教程(个人翻译 未完成)

原文地址:

https://devblogs.nvidia.com/even-easier-introduction-cuda/

An Even Easier Introduction to CUDA
一个更简单的CUDA入门教程


This post is a super simple introduction to CUDA, the popular parallel computing platform and programming model from NVIDIA. I wrote a previous “Easy Introduction” to CUDA in 2013 that has been very popular over the years. But CUDA programming has gotten easier, and GPUs have gotten much faster, so it’s time for an updated (and even easier) introduction.

这是一篇超级简单的CUDA入门介绍,一个由英伟达发起的非常流行的并行计算平台和编程模型。我在2013年写的“简单入门”至今为止也非常流行。但是CUDA编程开始变得越来越简单,并且GPU的性能也变得越来越高。所以是时候更一版更简单的入门介绍了!

CUDA C++ is just one of the ways you can create massively parallel applications with CUDA. It lets you use the powerful C++ programming language to develop high performance algorithms accelerated by thousands of parallel threads running on GPUs. Many developers have accelerated their computation- and bandwidth-hungry applications this way, including the libraries and frameworks that underpin the ongoing revolution in artificial intelligence known as Deep Learning.

“CUDA C++“ 这是众多大型并行计算应用中的一种,他能让你使用强大的C++编程语言来开发出在数以千计的并行线程下的更高效的算法。使用"CUDA C++"很多开发者用来提高他们应用的计算量和带宽的方法。包括那些支持深度学习的库与框架。

So, you’ve heard about CUDA and you are interested in learning how to use it in your own applications. If you are a C or C++ programmer, this blog post should give you a good start. To follow along, you’ll need a computer with an CUDA-capable GPU (Windows, Mac, or Linux, and any NVIDIA GPU should do), or a cloud instance with GPUs (AWS, Azure, IBM SoftLayer, and other cloud service providers have them). You’ll also need the free CUDA Toolkit installed.

所以,你可能听说过CUDA,你也可能对如何在你的应用中使用它感兴趣。 如果你是一个C/C++开发者,这篇博客会给你开一个好头的。接下来,你将需要一台载有支持CUDA的GPU的电脑。操作系统可以是Win/Mac/Linux,或者你也可以使用支持GPU计算的云服务器( 阿里云,腾讯云,亚马逊,微软,IBM...BLABLABLA) 其他的云服务器也肯能会支持。然后你需要安装免费的CUDA工具集。

Let’s get started!
让我们开始吧~~!!

Starting Simple
We’ll start with a simple C++ program that adds the elements of two arrays with a million elements each.
简单的开始
我们会从一个简单的C++程序( 两个数组,各100万个元素 )

#include <iostream>
#include <math.h>

// function to add the elements of two arrays
void add(int n, float *x, float *y)
{
  for (int i = 0; i < n; i++)
      y[i] = x[i] + y[i];
}

int main(void)
{
  int N = 1<<20; // 1M elements

  float *x = new float[N];
  float *y = new float[N];

  // initialize x and y arrays on the host
  for (int i = 0; i < N; i++) {
    x[i] = 1.0f;
    y[i] = 2.0f;
  }

  // Run kernel on 1M elements on the CPU
  add(N, x, y);

  // Check for errors (all values should be 3.0f)
  float maxError = 0.0f;
  for (int i = 0; i < N; i++)
    maxError = fmax(maxError, fabs(y[i]-3.0f));
  std::cout << "Max error: " << maxError << std::endl;

  // Free memory
  delete [] x;
  delete [] y;

  return 0;
}


First, compile and run this C++ program. Put the code above in a file and save it as add.cpp, and then compile it with your C++ compiler. I’m on a Mac so I’m using clang++, but you can use g++ on Linux or MSVC on Windows.
首先,我们来编译并运行这个程序,把这些代码保存为add.cpp, 并使用你的C++编译器编译他, 我这边是Mac平台,所以我使用clang++,如果你在Linux平台可以使用G++,如果是Windows 则是Visual Studio

> clang++ add.cpp -o add

Then run it:
然后运行它

> ./add
 Max error: 0.000000

 (On Windows you may want to name the executable add.exe and run it with .\add.)
 在Windows平台你需要重命名为add.exe然后运行它

As expected, it prints that there was no error in the summation and then exits. Now I want to get this computation running (in parallel) on the many cores of a GPU. It’s actually pretty easy to take the first steps.
正如我们预期的,他正确的运行并退出,现在我希望让这个计算运行

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值