2.4运行时决定线程数量（C++并发编程实战）

最新推荐文章于 2024-01-29 14:45:24 发布

扮猪吃饺子

最新推荐文章于 2024-01-29 14:45:24 发布

阅读量762

点赞数 1

分类专栏： C++并发编程实战

本文链接：https://blog.csdn.net/weixin_28712713/article/details/90762123

版权

C++并发编程实战专栏收录该内容

12 篇文章 4 订阅

订阅专栏

std::thread::hardware_concurrency()这个函数将返回同时并发在一个程序中的数量。在多核系统中，返回值可以是CPU核心的数量，返回值也仅仅是一个提示，当系统无法获取时，函数返回0。

如下实现了一个并行版的std::accumulate：代码中将整体工作拆分成小任务交给每个线程去做，并设置最小的数，是为了避免太多的线程。在操作数为0的时候抛异常：


template<typename Iterator,typename T>
struct accumulate_block
{
	void operator()(Iterator first,Iterator last,T& result)
	{
		result = std::accumulate(first,last,result);
	}
}

template<typename Iterator,typename T>
T parallel_accumulate(Iterator first,Iterator last,T init)
{
	unsigned long const length = std::distance(first,last);
	
	if(!length)			//1
		return init;
	
	unsigned long const min_per_pthread = 25;
	unsigned long const max_threads = 
						(length + min_per_pthread - 1)/min_per_pthread; //2
	unsigned long const hardware_threads = 
						std::thread::hardware_concurrency();
	unsigned long const num_threads = 		//3
						std::min(hardware_threads != 0?hardware_threads:2,max_threads);
	unsigned long const block_size = length/num_threads;	//4
	
	std::vector<T> results(num_threads);
	std::vector<std::thread> threads(num_threads - 1); //5
	
	Iterator block_start = first;
	for(unsigned long i = 0; i < (num_threads - 1); ++i)
	{
		Iterator block_end = block_start;
		std::advance(block_end,block_size);		//6
		threads[i] = std::thread(accumulate_block<Iterator,T>(),  //7
					  block_start,block_end,std::ref(results[i]));
		block_start = block_end;	//8
	}
	
	accumulate_block<Iterator T>()(
		block_start,last,results[num_threads - 1]);	//9
		
	std::for_each(threads.begin(),threads.end(),
		std::mem_fn(&std::thread_join));		//10
	
	return std::accumulate(results.begin(),results.end(),init); //11
}

int main()
{
	int num[10000];
	for(unsigned i = 0; i < 10000; ++i)
	{
		num[i] = i;
	}
	int result = 0;
	result = parallel_accumulate(num,num+9999,1);
	std::cout << result << std::endl;
}

程序说明：如果输入范围为空（1），就会得到init值，反之如果范围内多余一个元素，都需要用范围内的元素的总数除以线程块中最小的任务数，从而确定启动线程的最大数量（2），这样能避免无所谓的计算资源浪费。

计算量的最大值和硬件支持线程数中，较小的为启动线程的数量（3）。

当std::thread::hardware_concurrency()返回0，你可以选择一个数字作为你的选择，本例中选择了2。

每个线程中处理的元素数量，是范围内元素的总量除以线程的个数的出来的（4）。

现在，确定了线程的数量，通过创建一个vector容器存放中间结果，并为线程创建一个vector<std::thread>容器（5），这里线程数必须必num_threads少一个，因为启动之前已经有一个主线程了。

使用简单的循环来启动线程：block_end迭代器指向当前块的末尾（6），并启动一个新线程为当前块累加结果（7）。当迭代器指向当前块的末尾时，启动下一个块（8）。

启动所有线程后，（9）中为线程会处理最终块的结果。

当累加最终块的结果后，可以等待std::for_each()（10）创建线程的完成，之后使用std::accumulate将所有结果累加（11）。

扮猪吃饺子

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
2.4运行时决定线程数量（C++并发编程实战）

std::thread::hardware_concurrency()这个函数将返回同时并发在一个程序中的数量。在多核系统中，返回值可以是CPU核心的数量，返回值也仅仅是一个提示，当系统无法获取时，函数返回0。如下实现了一个并行版的std::accumulate：代码中将整体工作拆分成小任务交给每个线程去做，并设置最小的数，是为了避免太多的线程。在操作数为0的时候抛异常：templat...
复制链接

扫一扫