Thrust快速入门教程（二） —— Vectors

最新推荐文章于 2024-09-26 15:30:48 发布

zerolover

最新推荐文章于 2024-09-26 15:30:48 发布

阅读量8.3k

点赞数 6

分类专栏： cuda 文章标签： cuda thrust

cuda 专栏收录该内容

6 篇文章 5 订阅

订阅专栏

　　Thrust提供了两个vector容器：host_vector 与 device_vector。顾名思义，host_vector位于主机端，device_vector位于GPU设备端。Thrust的vector容器与STL中的容器类似，是通用的容器（可以存储任何数据类型），可以动态调整大小。以下源代码展示如何使用Thrust的vector容器。

#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <iostream>

int main(void)
{
  // H has storage for 4 integers
  thrust::host_vector<int> H(4);

  // initialize individual elements
  H[0] = 14;
  H[1] = 20;
  H[2] = 38;
  H[3] = 46;

  // H.size() returns the size of vector H
  std::cout << "H has size " << H.size() << std::endl;

  // print contents of H
  for(int i = 0; i < H.size(); i++)
  {
    std::cout << "H[" << i << "] = " << H[i] << std::endl;
  }

  // resize H
  H.resize(2);

  std::cout << "H now has size " << H.size() << std::endl;

  // Copy host_vector H to device_vector D
  thrust::device_vector<int> D = H;

  // elements of D can be modified
  D[0] = 99;
  D[1] = 88;

  // print contents of D
  for(int i = 0; i < D.size(); i++)
  {
    std::cout << "D[" << i << "] = " << D[i] << std::endl;
  }

  // H and D are automatically destroyed when the function returns
  return 0;
}

　　如这个例子所示，=运算符可以将host_vector复制到device_vector（反过来也可以）。还可以将device_vector复制到device_vector，或将host_vector复制到host_vector。注意到device_vecto的每个元素可以用[]来访问。但是由于每次访问需要调用cudaMemcpy()，所以应谨慎使用。后面我们将看到一些更有效的技术来访问元素。初始化所有向量的元素为特定值、或拷贝一个vector中的部分元素到另外一个vector，这些非常常见的操作。Thrust提供了一些方法可以完成这些种操作。

#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/copy.h>
#include <thrust/fill.h>
#include <thrust/sequence.h>
#include <iostream>

int main(void)
{
  // 初始化device_vector D，包含10个1
  thrust::device_vector<int> D(10, 1);                                  // 1 1 1 1 1 1 1 1 1 1 

  // 将D的前7个元素设为9
  thrust::fill(D.begin(), D.begin() + 7, 9);                               // 9 9 9 9 9 9 9 1 1 1

  // 利用D的前5个元素初始化host_vector H
  thrust::host_vector<int> H(D.begin(), D.begin() + 5);          // 9 9 9 9 9 

  // 将H的元素设为 0, 1, 2, 3, ...
  thrust::sequence(H.begin(), H.end());                                 // 0 1 2 3 4

  // 将H复制到D
  thrust::copy(H.begin(), H.end(), D.begin());                         // 0 1 2 3 4 9 9 1 1 1

  // 输出 D
  for(int i = 0; i < D.size(); i++)
  {
    std::cout << "D[" << i << "] = " << D[i] << std::endl;            // 0 1 2 3 4 9 9 1 1 1
  }

  return 0;
}

　　这里我们看到了fill、copy、sequence的使用方法。copy函数可以用来拷贝主机端或者设备端的数据到另外一个vector。与STL中的类似，thrust::fill用于向一段元素赋特定值。thrust::sequence可以用来生成等差数列。

Thrust Namespace

　　在我们的例子中使用了thrust::host_vector或thrust::copy的字段。其中thrust::告诉编译器在thrust命名空间中查找函数与类。命名空间是一个很好的方式避免命名重复。例如，thrust::copy就可以与STL中的std::copy区别开来。C++的命名空间使我们区分这两个copy函数。

Iterators and Static Dispatching

　　在这节中我们曾使用了这样的表达式，H.begin() 、H.end()、D.begin() + 7。begin()与end()的返回值在C++中被称为迭代器。

　　vector的迭代器类似于数组的指针，用于指向数组的某个元素。H.begin()是指向H容器中第一个元素的迭代器，H.end()指向H容器中的最后一个元素的下一个位置（不是最后一个元素）。虽然vector迭代器类似于指针，但它包含更丰富的信息。注意到在使用thrust::fill的时候，我们并不需要指明是对device_vector的迭代器的操作。因为D.begin()返回值已经确定了迭代器类型，其类型不同于H.begin()的。

　　当调用Thrust中的函数时，将根据迭代器的类型选择使用主机端还是设备端的算法实现。因为主机/设备调度是在编译时解析，所以这一过程被称为静态调度。这意味着在运行时没有额外的调度进程。

　　你可能想知道当raw指针作为Thrust函数的参数会如何。和STL一样，Thrust允许这种用法，并调度主机端的算法实现。如果传入的指针是指向设备端内存的指针，那么在调用函数之前需要用thrust::device_ptr封装。例如：

#include <thrust/device_ptr.h>
#include <thrust/fill.h>
#include <cuda.h>

int main(void)
{
    size_t N = 10;

    // obtain raw pointer to device memory
    int * raw_ptr;
    cudaMalloc((void **) &raw_ptr, N * sizeof(int));

    // wrap raw pointer with a device_ptr 
    thrust::device_ptr<int> dev_ptr = thrust::device_pointer_cast(raw_ptr);

    // use device_ptr in Thrust algorithms
    thrust::fill(dev_ptr, dev_ptr + N, (int) 0);

    // access device memory transparently through device_ptr
    dev_ptr[0] = 1;

    // free memory
    cudaFree(raw_ptr); //NOTE  cudaMalloc and cudaFree

    return 0;
}

　　如需从device_ptr中提取raw指针，需要使用raw_pointer_cast，用法如下：

#include <thrust/device_ptr.h>
#include <thrust/device_malloc.h>
#include <thrust/device_free.h>
#include <thrust/device_vector.h>
#include <cuda.h>

int main(void)
{
    size_t N = 10;

    // create a device_ptr 
    thrust::device_ptr<int> dev_ptr = thrust::device_malloc<int>(N);

    // extract raw pointer from device_ptr
    int * raw_ptr = thrust::raw_pointer_cast(dev_ptr);

    // use raw_ptr in CUDA API functions
    cudaMemset(raw_ptr, 0, N * sizeof(int));

    // free memory
    thrust::device_free(dev_ptr); //NOTE device_malloc and device_free

    // we can use the same approach for device_vector
    thrust::device_vector<int> d_vec(N);

    // note: d_vec.data() returns a device_ptr
    raw_ptr = thrust::raw_pointer_cast(d_vec.data()); //NOTE or (&d_vec[0])

    return 0;
}

　　迭代器另一个区别于指针的地方在于它可以遍历各种数据结构。例如，STL提供了链表容器std::list，提供双向的（但不是随机访问）的迭代器。虽然Thrust不提供这类容器的设备端实现，但是与它们兼容。

#include <thrust/device_vector.h>
#include <thrust/copy.h>
#include <list>
#include <vector>

int main(void)
{
  // create an STL list with 4 values
  std::list<int> stl_list;

  stl_list.push_back(10);
  stl_list.push_back(20);
  stl_list.push_back(30);
  stl_list.push_back(40);

  // initialize a device_vector with the list
  thrust::device_vector<int> D(stl_list.begin(), stl_list.end());

  // copy a device_vector into an STL vector
  std::vector<int> stl_vector(D.size());
  thrust::copy(D.begin(), D.end(), stl_vector.begin());

  return 0;
}