nvidia并行计算库thrust测试

最新推荐文章于 2024-07-10 16:21:46 发布

皿小草

最新推荐文章于 2024-07-10 16:21:46 发布

阅读量1.3k

点赞数 1

本文链接：https://blog.csdn.net/oqqYuan1234567890/article/details/105336669

版权

nvidia thrust库的简介：
Thrust is a C++ parallel programming library which resembles the C++ Standard Library.
github地址： https://github.com/thrust/thrust
c++ 并行库，在cuda环境中默认安装了。
在实际业务中，比较有用的是sort和reduce(sum)，大家看到这个可能会疑惑，很多库都可以实现sort和sum的操作，但是为什么还要大费周章用thrust呢？
对于一些高性能的应用，cpu的计算能力已经无法满足，就需要用到一些异构计算卡，比如gpu/fpga/npu等，来补全cpu的短板。thrust是nvidia提供的，借助cuda来实现sort和sum操作的一个库。

前期准备

必须有nvidia的显卡，如果是IDC机房的服务器，可能就要上tesla的显卡，有点贵O(∩_∩)O哈！
安装好cuda环境

编程的变化

主要的编程习惯和cuda编程的是一致的，host代表cpu，device代表gpu。内存中的数据分为host数据和device数据，两者数据交换需要显式复制。

验证sort性能

验证一下官网的sort例子

#include <iostream>
#include <ctime>
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/generate.h>
#include <thrust/sort.h>
#include <thrust/copy.h>
#include <algorithm>
#include <cstdlib>
using namespace std;

int main(void)
{
  // generate 32M random numbers serially
  thrust::host_vector<int> h_vec(32 << 20);
  std::generate(h_vec.begin(), h_vec.end(), rand);
  clock_t startTime,endTime;
  startTime = clock();//计时开始
 
  // transfer data to the device
  thrust::device_vector<int> d_vec = h_vec;

  // sort data on the device (846M keys per second on GeForce GTX 480)
  thrust::sort(d_vec.begin(), d_vec.end());

  // transfer data back to host
  thrust::copy(d_vec.begin(), d_vec.end(), h_vec.begin());
  endTime = clock();//计时结束
  cout << "The run time second is: " <<(double)(endTime - startTime) / CLOCKS_PER_SEC << "s" << endl;
  return 0;
}

编译运行

$ nvcc sort.cu -o sort
$ ./sort              
The run time second is: 0.419299s

32M的数据，3200万条数据，400ms就完成了，在cpu上面实现是比较吃力的。对于一些需要实时性要求非常高的sort的场景，就非常有用了。

皿小草

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫