CUDA中thrust库的使用

GaryW666

已于 2022-02-20 22:43:33 修改

阅读量9.6k

点赞数 3

分类专栏： cuda学习心得文章标签： cuda

于 2021-01-05 21:02:20 首次发布

本文链接：https://blog.csdn.net/qq_43707919/article/details/112251190

版权

cuda学习心得专栏收录该内容

2 篇文章

订阅专栏

CUDA中thrust库的使用

接触过CUDA的各位应该都了解过归约算法，包括归约算法求和、求最大最小值、求方差标准差等等。为了保证算法的时间复杂度，我们常常会花费大量的时间去优化归约算法的实现，包括线程分散度的问题、thread分歧以及bank冲突的问题等等。当数据维度较小时还能够冷静的分析每一个可能还存在优化空间的点，但当数据维度较大时，常常感觉优化的程度还是不够。不要慌，这时就是体现CUDA强大的时刻，CUDA的thrust库可以完美的解决这些问题。
本文主要记录一下最近使用到的thrust库中的函数，包括reduce、sort、unique等等。

1、vector

在记录函数之前，首先记录一下thrust提供的数据类型vector，thrust中定义了host端和device端的两种vector，分别定义在host_vector.h和device_vector.h中，在声明变量时也很简单：

thrust::host_vector<type> hvec;
thrust::device_vector<type> dvec;
dvec=hvec; //device vector和 host vector可以直接用等号进行传递，对应于cudaMemcpy的功能

thrust中还定义了device_ptr指针类型，当传入函数的指针是指向device端的内存时，需要用device_ptr进行封装：

float array[6] = { 3, 1, 2, 3, 5, 4 };
float *dev_array = 0;
cudaMalloc(&dev_array, 4 * 6);
cudaMemcpy(dev_array, array, 4 * 6, cudaMemcpyHostToDevice);
thrust::device_ptr<float> dev_ptr(dev_array);
thrust::reduce(dev_ptr, dev_ptr + 6);//由于dev_array指向device端，不能直接作为参数，需要对其封装

thrust::host_vector<type> hvec;
thrust::device_vector<type> dvec;
dvec=hvec;
thrust::reduce(dvec.begin(), dvec.end());//此时的参数是迭代器，不用也不能用device_ptr对其封装

//上述的两种函数的调用方法也存在host端的版本，传入的指针或者迭代器都是host端数据
thrust::reduce(array, array + 6);
thrust::reduce(hvec.begin(), hvec.end());

//从device_ptr中提取“原始”指针需要使用raw_pointer_cast函数
float dev_array=thrust::raw_pointer_cast(dev_ptr);

上述程序中说明了thrust中的数据类型以及函数调用时四种调用方式，本文后续函数使用的例子中，一个函数只选取一种调用方式，其余的方式可以类比调用。

2、reduce

thrust::reduce函数主要用于归约操作，在reduce.h中被定义，其返回值可以为一个具体数值，调用的方式也很简单：

thrust::device_ptr<float> dev_xptr(dev_xvalue);
double sum = thrust::reduce(dev_xptr, dev_xptr + N);

也可以通过thrust::transform_reduce函数，定义自己想要的归约方式，下面的例子定义了一个求方差的归约：

//随意定义想要的归约方式
struct variance: std::unary_function<float, float>
{
	variance(float m): mean(m){ }
	const float mean;
	__host__ __device__ float operator()(float data) const
	{
		return ::pow(data - mean, 2.0f);
	}
};
//需要提前通过reduce函数求和，从而获得均值mean
float variance = thrust::transform_reduce(dev_ptr,dev_ptr + N,variance(mean),0.0f,thrust::plus<float>()) / N;

3、sort

sort函数用于对数据的数据进行排序，在sort.h中定义，使用方式也很简单：

int A[6] = {1, 4, 2, 8, 5, 7};
thrust::sort(A, A + 6);
//result:{1, 2, 4, 5, 7, 8} 默认排序方式为由小到大

//sort_by_key函数可以根据键值来进行排序
int keys[N] = {1, 4, 2, 8, 5, 7};
char values[N] = {'a', 'b', 'c', 'd', 'e', 'f'};
thrust::sort_by_key(keys, keys + N, values);
//result: key{1, 2, 4, 5, 7, 8}
//        values{'a', 'c', 'b', 'e', 'f', 'd'}

当然sort函数也可以自己定义排序方式，下面的例子是对三维坐标点的坐标值按照由小到大排序：

struct compRule
{
	__host__ __device__
		bool operator()(const float3 &p1,const float3 &p2)
	{
		if (p1.x != p2.x)
			return p1.x <= p2.x;
		else if (p1.y != p2.y)
			return p1.y <= p2.y;
		else if (p1.z != p2.z)
			return p1.z <= p2.z;
		
	}
};
thrust::host_vector<float3> p(10);
thrust::sort(p.begin(), p.end(), compRule());
//输入                        //输出
//p[0] = { 0, 0, 0 };         p[0] = { 0, 0, 0 }; 
//p[1] = { 0, 1, 0 };         p[1] = { 0, 0, 1 };
//p[2] = { 1, 0, 0 };         p[2] = { 0, 1, 0 };
//p[3] = { 1, 0, 1 };         p[3] = { 0, 1, 1 };
//p[4] = { 1, 1, 1 };         p[4] = { 0, 1, 1 };
//p[5] = { 0, 1, 1 };         p[5] = { 1, 0, 0 };
//p[6] = { 0, 0, 1 };         p[6] = { 1, 0, 1 };
//p[7] = { 1, 1, 0 };         p[7] = { 1, 0, 1 };
//p[8] = { 0, 1, 1 };         p[8] = { 1, 1, 0 };
//p[9] = { 1, 0, 1 };         p[9] = { 1, 1, 1 };

最后还有stable_sort函数，与sort不同的是，其在排序时不改变相同数据的相对位置。

3、max_element(min_element)

max_element(min_element)函数，故名思义，求最大(小)值，在extrema.h被定义，调用方法如下：

thrust::device_vector<type>::iterator iter = thrust::max_element(dvec.begin()，dvec.end());
//其返回值是一个迭代器
int position = iter - dvec.begin();//获取最大(小)值所在位置
type max_val = *iter; //获取最大(小)值结果

4、unique

unique函数，用来将一组数据中满足条件的数据筛选出来，在unique.h中被定义，也可以自定义筛选条件。下面例子的功能是删除重复数据：

struct is_sam
{
	__host__ __device__
		bool operator()(const float3 &p1, const float3 &p2)
	{
		return (p1.x==p2.x) && (p1.y==p2.y) && (p1.z==p2.z);
	}
};

thrust::unique(p.begin(), p.end(),is_sam()),p.end();
//unique函数的功能只是将满足条件的数据筛选出来，无法直接删除，需要结合vector的erase函数进行删除
p.erase(thrust::unique(p.begin(), p.end(),is_sam()),p.end());

//输入                       //unique后结果              //erase后结果
//p[0] = { 0, 0, 0 }         p[0] = { 0, 0, 0 }         p[0] = { 0, 0, 0 }
//p[1] = { 0, 0, 1 }         p[1] = { 0, 0, 1 }         p[1] = { 0, 0, 1 }
//p[2] = { 0, 1, 0 }         p[2] = { 0, 1, 0 }         p[2] = { 0, 1, 0 }
//p[3] = { 0, 1, 1 }         p[3] = { 0, 1, 1 }         p[3] = { 0, 1, 1 }
//p[4] = { 0, 1, 1 }         p[4] = { 1, 0, 0 }         p[4] = { 1, 0, 0 }
//p[5] = { 1, 0, 0 }         p[5] = { 1, 0, 1 }         p[5] = { 1, 0, 1 }
//p[6] = { 1, 0, 1 }         p[6] = { 1, 1, 0 }         p[6] = { 1, 1, 0 }
//p[7] = { 1, 0, 1 }         p[7] = { 1, 1, 1 }         p[7] = { 1, 1, 1 }
//p[8] = { 1, 1, 0 }         p[8] = { 0, 1, 1 }
//p[9] = { 1, 1, 1 }         p[9] = { 1, 0, 1 }