TBB学习使用记录

最新推荐文章于 2024-10-08 16:07:11 发布

larry_dongy

最新推荐文章于 2024-10-08 16:07:11 发布

阅读量3.4k

点赞数

分类专栏：软件与库文章标签： c++ 多线程

本文链接：https://blog.csdn.net/tfb760/article/details/106096245

版权

软件与库专栏收录该内容

33 篇文章 2 订阅

订阅专栏

学习basalt时用到tbb的parallel_for，记录一下tbb的学习使用。

1. 安装

从github下载源码：https://github.com/oneapi-src/oneTBB
进入源码后执行 make
进入build路径，给所有的 .sh文件加执行权限，并运行

make
cd build
chmod +x *.sh
sh generate_tbbvars.sh
sh tbbvars.sh

发现生成了一些 *.so文件，之后将：include/tbb 文件拷入系统的 /usr/local/include，将build中的所有 *.so 与 *.so2 文件拷入 /usr/local/lib
安装完成后，进入 /examples/pipeline/square 后执行make，出现以下提示说明编译配置成功

g++ -O2 -DNDEBUG -o square square.cpp -ltbb
./square input.txt output.txt
serial run time = 0.32484
parallel run time = 0.167507

之后在自己的CMakeLists中include到头文件，并link到相应的so文件即可。

存在问题未解决：
Cmake的配置文件还不知道放到哪里，在自己的CMakeLists中直接find_package(TBB)目前还找不到，只能手动加路径。

2. tbb::parallel_for

Intel官方教程：https://software.intel.com/content/www/us/en/develop/documentation/tbb-tutorial/top/tutorial-developing-applications-using-parallelfor/develop-an-application-using-parallelfor.html]

The first parameter of the call is a blocked_range object that describes the iteration space. blocked_range is a template class provided by the Intel TBB library. The constructor takes the following parameters: 1. The lower bound of the range. 2. The upper bound of the range.
The second parameter to the parallel_for function is the function object to be applied to each subrange.

简单理解：parallel_for第一个参数是一个blocked_range类型，表示数据的起止大小；第二个参数是具体要干的函数，parallel_for会自动将数组分成小会计进行并行处理。

自己根据教程写的示例代码如下：

#include <iostream>
#include <tbb/blocked_range.h>
#include <tbb/parallel_for.h>

using namespace std;
using namespace tbb;

void func(int v){
	cout << "v = " << v << endl;
}

class Apply{
public:
	int* my_a;
	void operator()(const tbb::blocked_range<size_t> &r) const {
		cout << "r begin: " << r.begin() << ", end: " << r.end() << endl;
		for(auto i=r.begin(); i!=r.end(); ++i){
			func(my_a[i]);
		}
	}
	Apply(int a[]){
		my_a = a;
	}
};

int main(void){
	int N = 100;
	int a[N];
	for(int i=0; i<N; ++i)
		a[i] = i * 2;
	
	// 使用类的()重载实现
	// parallel_for(blocked_range<size_t>(0, N), Apply(a));		
	
	// 使用匿名函数进行实现
	parallel_for(blocked_range<size_t>(0, N), [&](const tbb::blocked_range<size_t> &r){
		cout << "r begin: " << r.begin() << ", end: " << r.end() << endl;
		for(auto i=r.begin(); i!=r.end(); ++i){
			cout << " i = " << i << endl;
		}
	});
	return 0;
}

3. 使用push_back

2020/5/24更新

一开始使用了vector的push_back，总是运行崩溃。经学习，“vector并不是多线程安全的”，要是用tbb的concurrent_vector。包含 <concurrent_vector.h> 头文件。

自己进行icp的例子如下。一开始用了两个concurrent_vector 进行push_back，结果icp总是莫名的bug。查了好久好久，能找的地方都找了，后来发现两个push_back这里有问题。可能先后顺序有问题，导致匹配顺序错误。后来采用了一个vec4i存储两个点，问题解决。但是用时并没有比单线程少，反而多了（测试：增加了单线程程序的复杂度，tbb多线程后用时会减少）。可能tbb有一定的基础开销吧。虽然还是没有啥用，但对tbb有了更深入的了解。

真tm累……

tbb::concurrent_vector<Vec4f> con_vec4f;
auto func = [&] (const tbb::blocked_range<int> &range){
	for(auto i=range.begin(); i!=range.end(); ++i){
		Point2f p = vp[i];
		double minDist = 999;
		int minIndex = -1;
		for(int j=0; j<vq.size(); ++j){
			Point2f q = vq[j];
			double dist = (p.x - q.x) * (p.x - q.x) + (p.y - q.y) * (p.y - q.y);
			if(dist < minDist){
				minDist = dist;
				minIndex = j;
			}
		}
		// compare outliers.
		if (minDist < outlier_thresh * outlier_thresh){		
			// con_vp_match.push_back(p);			// 先后push_back两个，顺序可能会出错。FUCK this!
			// con_vq_match.push_back(vq[minIndex]);
			Vec4f vec4f(p.x, p.y, vq[minIndex].x, vq[minIndex].y);
			con_vec4f.push_back(vec4f);
		}
	}
};
tbb::blocked_range<int> range(0, vp.size());
tbb::parallel_for(range, func);