OpenCV并行加速Parallel_for_与ParallelLoopBody教程

最新推荐文章于 2023-10-07 19:27:18 发布

LoveMIss-Y

最新推荐文章于 2023-10-07 19:27:18 发布

阅读量5.2k

点赞数 12

分类专栏： OpenCV 文章标签： ParallelLoopBody parallel_for_ OpenCV并行加速 OpenCV4.1.1 opencv2/core/utili c++运算符重载

本文链接：https://blog.csdn.net/qq_27825451/article/details/103878676

版权

OpenCV 专栏收录该内容

6 篇文章 3 订阅

订阅专栏

前言：对于一些基本的循环运算，如果我们直接使用循环，即便是使用指针，运算效率也不高，如果我们使用并行计算，会大大提升运算效率，OpenCV里面的很多运算都是使用了并行加速的，本文主要介绍Parallel_for_与ParallelLoopBody教程的使用方法。我看了网上的很多教程，其实都是同一篇文章转来转去，而且版本比较低，决定自己写一篇，本文使用OpenCV4.1.1 ，需要使用 <opencv2/core/utility.hpp> 这个头文件。

一、先从构造函数的运算符重载说起

我们在调用函数的时候，实际上是使用了括号（）运算符，构造函数也是普通的函数，所以也用到了括号运算符，那如果想要重载这个括号()运算符怎么做呢？先来看一个简单的小例子：

namespace myanimal
	{	
		class Animal
		{
		public:
			Animal(float weight_,int age_) 
			{
				weight = weight_;
				age = age_;
			}
			void operator()(float height) const //重载操作符（）
			{
				height = 100.0;
				std::cout << "the age is : " << age << std::endl;
				std::cout << "the weight is : " << weight << std::endl;
				std::cout << "the height is : " << height << std::endl;
				
			}

		private:
			float weight;
			int age;
		};

	}

现在调用：

int main(int argc, char* argv[])
{
    myanimal::Animal animal(50.0,25);  //创建对象
	animal(100.0);                     //通过对象调用重载的括号 () 运算符
	
	getchar();
	return 0;
}
/*
the age is : 25
the weight is : 50
the height is : 100
*/

总结：

（1）重载的括号运算符就像一个对象的方法一杨，依旧是通过对象去调用，调用的方式为 “对象名(参数列表)” 这样的形式；

（2）重载括号运算符的一般操作为 “返回类型 operator(参数列表)” ，后面的const可以不要，参数列表可以使任意的，

没有参数，则调用方式为：obj（）

一个参数，则调用方式为：obj（参数1）

多个参数，则调用方式为：obj（参数1，参数2，......）

二、Parallel_for_结合ParallelLoopBody使用的一般步骤

使用步骤一般遵循三步走的原则

（1）第一步：自定义一个类或者是一个结构体，使这个结构体或者是类继承自ParallelLoopBody类，如下：

class MyParallelClass : public ParallelLoopBody
{}
struct MyParallelStruct : public ParallelLoopBody
{}

（2）第二步：在自定义的类或者是结构体中，重写括号运算符（），注意：虽然前面讲括号运算符重载可以接受任意数量的参数，但是这里只能接受一个Range类型的参数（这是与一般的重载不一样的地方），因为后面的parallel_for_需要使用，如下：

void operator()(const Range& range)
{
   //在这里面进行“循环操作”
}

（3）第三步：使用parallel_for_进行并行处理

首先看一下parallel_for_的函数原型

#include <opencv2/core/utility.hpp>   //本文使用OpenCV4.1.1 ，需要使用这个头文件
CV_EXPORTS void parallel_for_(const Range& range, const ParallelLoopBody& body, double nstripes=-1.);

参数解释如下：

const Range& range, 即重载的括号运算符里面的参数，是一个Range类型
const ParallelLoopBody& body, 即自己实现的从ParallelLoopBody类继承的类或者是结构体对象
double nstripes=-1

怎么使用呢？如下：

parallel_for_(Range(start, end), MyParallelClass(构造函数列表));
//Range(start, end) 就是一个Range对象
//MyParallelClass(构造函数列表) 就是一个继承自ParallelLoopBody的类的对象

疑问？？？

前面在要使用重载的括号运算符里面的内容，需要显式的调用 obj(参数列表) 才行，在这里应该这么写才行

MyParallelClass obj = MyParallelClass(构造函数列表));  //构造对象
obj(Range(start, end));   //调用

这样写当然不会有什么问题，但是这样的执行方式，在括号重载运算符里面的内容是按照顺序执行的，并没有并发处理，如果是对于耗时任务，没有节约时间，那直接通过

parallel_for_(Range(start, end), MyParallelClass(构造函数列表));

处理，没有显式的调用重载的括号运算符，但实际上是隐式调用了的，而且以并发方式进行处理重载运算里面的内容。

三、Parallel_for_结合ParallelLoopBody的加速效果实验

3.1 自定义类实现

任务描述：我要定义两个Mat矩阵的逐元素乘积，如下所示

（1）自定义一个类继承自ParallelLoopBody，并且重载括号运算

#include <opencv2/core/utility.hpp>  //引入此头文件
#include <opencv2/opencv.hpp>

namespace cv
{
	namespace mygemm
	{
		class ParallelAdd : public ParallelLoopBody//参考官方给出的answer，构造一个并行的循环体类
		{
		public:
			ParallelAdd(Mat& _src1,Mat& _src2,Mat _result)    //构造函数
			{
				src1 = _src1;
				src2 = _src2;
				result = _result;
				CV_Assert((src1.rows == src2.rows) && (src1.rows == src2.rows));
				rows = src1.rows;
				cols = src1.cols;
			} 
			
			void operator()(const Range& range) const //重载操作符（）
			{
				int step = (int)(result.step / result.elemSize1());//获取每一行的元素总个数（相当于cols*channels，等同于step1)
				
				for (int col = range.start; col < range.end; ++col)
				{
					float * pData = (float*)result.col(col).data;
					float * p1 = (float*)src1.col(col).data;
					float * p2 = (float*)src2.col(col).data;
					for (int row = 0; row < result.rows; ++row)
						pData[row*step] = p1[row*step] * p2[row*step];
				}
			}

		private:
			Mat src1;
			Mat src2;
			Mat result;
			int rows;
			int cols;
		};
    }
}

可见重载的运算符里面是一个耗时操作，现在定义两种方式来实现两个Mat的逐元素乘积，一种是普通的逐元素处理，另一种是使用parallel进行并发处理，分别通过两个函数完成，如下：

//直接通过obj()形式调用，不采用并发处理
void testParallelClassWithFor(Mat _src1,Mat _src2,Mat result)
{
	result = Mat(_src1.rows, _src1.cols, _src1.type());
	int step = (int)(result.step / result.elemSize1());
	int totalCols = _src1.cols;
	typedef cv::mygemm::ParallelAdd ParallelAdd;
	ParallelAdd add = ParallelAdd(_src1, _src2, result);
	add(Range(0, totalCols));  //直接调用，没有并发
	}

	
void testParallelClassTestWithParallel_for_(Mat _src1,Mat _src2,Mat result)
{
	result = Mat(_src1.rows, _src1.cols, _src1.type());
	int step = (int)(result.step / result.elemSize1());
	int totalCols = _src1.cols;
	typedef cv::mygemm::ParallelAdd ParallelAdd;
	parallel_for_(Range(0, totalCols), ParallelAdd(_src1,_src2,result));  //隐式调用，并发
}

现在开始测试耗时对比

#include <opencv2/opencv.hpp>
#include <time.h>

#include "my_gemm.hpp"  //自己写的头文件，即前面所定义的类和函数

using namespace cv;
using namespace std;

int main(int argc, char* argv[])
{
	Mat testInput1 = Mat::ones(6400, 5400, CV_32F);
	Mat testInput2 = Mat::ones(6400, 5400, CV_32F);

	Mat result1, result2, result3;
	clock_t start, stop;

    //****************测试耗时对比****************************

	start = clock();
	mygemm::testParallelClassWithFor(testInput1, testInput2, result1);
	stop = clock();
	cout << "Running time using \'general for \':" << (double)(stop - start) / CLOCKS_PER_SEC * 1000 << "ms" << endl;

	start = clock();
	megemm::testParallelClassWithParallel_for_(testInput1, testInput2, result2);
	stop = clock();
	cout << "Running time using \'parallel for \':" << (double)(stop - start) / CLOCKS_PER_SEC * 1000 << "ms" << endl;

	start = clock();
	result3 = testInput1.mul(testInput2);
	stop = clock();
	cout << "Running time using \'mul function \':" << (double)(stop - start) / CLOCKS_PER_SEC * 1000 << "ms" << endl;

 
	getchar();
	return 0;
}
/*
Running time using 'general for ':645ms
Running time using 'parallel for ':449ms
Running time using 'mul function ':70ms
*/

总结：我们可以看见，使用parallel_for_并发的方式的的确确比直接调用快一些，快了将近200ms，但是依旧没有使用OpenCV自带的标准函数 mul 函数速度快，因为，OpenCV实现的函数库不仅仅经过了并行处理，还是用了更强大的底层优化，所以，只要是OpenCV自己带的方法，一般都是优先使用，除非自己写的比OpenCV的还牛逼一些。

3.2 自定义结构体实现

任务描述：现在定义一个并行运算的结构体，实现Mat逐元素的三次方运算

（1）自定义一个结构体继承自ParallelLoopBody，并且重载括号运算，如下：

namespace cv
{
	namespace mygemm
	{
		struct ParallelPow:ParallelLoopBody//构造一个供parallel_for使用的循环结构体
		{
			Mat* src;             //结构体成员，一个Mat类型的指针
			ParallelPow(Mat& _src)//struct 结构体构造函数
			{
				src = &_src;
			}
			void operator()(const Range& range) const
			{
				Mat& result = *src;
				int step = (int)(result.step / result.elemSize1());
				for (int col = range.start; col < range.end; ++col)
				{
					float* pData = (float*)result.col(col).data;
					for (int row = 0; row < result.rows; ++row)
						pData[row*step] = std::pow(pData[row*step], 3); //逐元素求立方
				}
			}
		};	
	}
}

下面定义两个函数，一个是直接通过for循环逐元素进行立方运算，一个是通过parallel_for_并发运算的,通过两个函数实现，如下所示：

void testParallelStructWithFor(Mat _src)
{
	int totalCols = _src.cols;
	typedef cv::mygemm::ParallelPow ParallelPow;
	ParallelPow obj = ParallelPow(_src);
	obj(Range(0, totalCols));
}

void testParallelStructWithParallel_for(Mat _src)
{
	int totalCols = _src.cols;
	typedef cv::mygemm::ParallelPow ParallelPow;
	parallel_for_(Range(0, totalCols), ParallelPow(_src));
}

下面开始测试性能消耗对比

#include <opencv2/opencv.hpp>
#include <time.h>

#include "my_gemm.hpp"

using namespace cv;
using namespace std;

int main(int argc, char* argv[])
{
	Mat testInput1 = Mat::ones(6400, 5400, CV_32F);
	Mat testInput2 = Mat::ones(6400, 5400, CV_32F);

	Mat result1, result2, result3;
	clock_t start, stop;

    
	start = clock();
	mygemm::testParallelStructWithFor(testInput1);
	stop = clock();
	cout << "Running time using \'general for \':" << (double)(stop - start) / CLOCKS_PER_SEC * 1000 << "ms" << endl;

	start = clock();
	megemm::testParallelStructWithParallel_for(testInput1);
	stop = clock();
	cout << "Running time using \'parallel for \':" << (double)(stop - start) / CLOCKS_PER_SEC * 1000 << "ms" << endl;

	start = clock();
	testInput1.mul(testInput1).mul(testInput1);
	stop = clock();
	cout << "Running time using \'mul function \':" << (double)(stop - start) / CLOCKS_PER_SEC * 1000 << "ms" << endl;
	

	getchar();
	return 0;
}
/*
Running time using 'general for ':881ms
Running time using 'parallel for ':195ms
Running time using 'mul function ':76ms
*/

我们发现，并行运算效率有着显著提升，但是相较于OpenCV的标准实现，依旧偏慢，

总结：在能够使用OpenCV标准函数实现的时候，尽量不要再自己定义运算，标准的OpenCV函数式经过优化了的，运算效率很高，

LoveMIss-Y

关注

12
点赞
踩
49

收藏

觉得还不错? 一键收藏
3
评论
OpenCV并行加速Parallel_for_与ParallelLoopBody教程

前言：对于一些基本的循环运算，如果我们直接使用循环，即便是使用指针，运算效率也不高，如果我们使用并行计算，会大大提升运算效率，OpenCV里面的很多运算都是使用了并行加速的，本文主要介绍Parallel_for_与ParallelLoopBody教程的使用方法。我看了网上的很多教程，其实都是同一篇文章转来转去，而且版本比较低，决定自己写一篇，本文使用OpenCV4.1.1 ，需要使用 <...
复制链接

扫一扫