项目场景:
项目使用到了opencv进行图像处理,其中有个函数需要对一个std::vector<cv::Mat>里面的图像进行图像处理,但是单张图像耗时较长导致整体处理耗时较长,此时可以用多线程的方式加快处理速度,偶然发现opencv有一个cv::parallel_for_模块可以快速实现这个效果。
使用方法
原始代码大概长这样:
std::vector<cv::Mat> res(images_path.size());
for (int i = 0; i < images_path.size(); ++i)
{
auto path = images_path[i];
cv::Mat img = cv::imread(path, 0);
res[i] = img;
}
并行后:
#include <opencv2/core/parallel/parallel_backend.hpp>
#include <opencv2/core/utility.hpp>
#include <opencv2/core.hpp>
std::vector<cv::Mat> res(images_path.size());
cv::parallel_for_(cv::Range(0, images_path.size()), [&](const cv::Range& range) {
for (int i = range.start; i < range.end; ++i)
{
auto path = images_path[i];
cv::Mat img = cv::imread(path, 0);
res[i] = bmp;
}
});
小tips:
最简单的cv::parallel_for_就用起来了,但是这里存在两个不算问题的小问题:1)cv::parallel_for_的并行后端包含openMP,TBB,和最原生的多线程,通常博客会写到opencv会自动选择并行后端,但是如何进行呢?2)并行的线程数是不是越高越好?
后端选择:
根据 <opencv2/core/parallel/parallel_backend.hpp>的内容:
// This file is part of OpenCV project.
// It is subject to the license terms in the LICENSE file found in the top-level directory
// of this distribution and at http://opencv.org/license.html.
#ifndef OPENCV_CORE_PARALLEL_BACKEND_HPP
#define OPENCV_CORE_PARALLEL_BACKEND_HPP
#include "opencv2/core/cvdef.h"
#include <memory>
namespace cv { namespace parallel {
#ifndef CV_API_CALL
#define CV_API_CALL
#endif
/** @addtogroup core_parallel_backend
* @{
* API below is provided to resolve problem of CPU resource over-subscription by multiple thread pools from different multi-threading frameworks.
* This is common problem for cases when OpenCV compiled threading framework is different from the Users Applications framework.
*
* Applications can replace OpenCV `parallel_for()` backend with own implementation (to reuse Application's thread pool).
*
*
* ### Backend API usage examples
*
* #### Intel TBB
*
* - include header with simple implementation of TBB backend:
* @snippet parallel_backend/example-tbb.cpp tbb_include
* - execute backend replacement code:
* @snippet parallel_backend/example-tbb.cpp tbb_backend
* - configuration of compiler/linker options is responsibility of Application's scripts
*
* #### OpenMP
*
* - include header with simple implementation of OpenMP backend:
* @snippet parallel_backend/example-openmp.cpp openmp_include
* - execute backend replacement code:
* @snippet parallel_backend/example-openmp.cpp openmp_backend
* - Configuration of compiler/linker options is responsibility of Application's scripts
*
*
* ### Plugins support
*
* Runtime configuration options:
* - change backend priority: `OPENCV_PARALLEL_PRIORITY_<backend>=9999`
* - disable backend: `OPENCV_PARALLEL_PRIORITY_<backend>=0`
* - specify list of backends with high priority (>100000): `OPENCV_PARALLEL_PRIORITY_LIST=TBB,OPENMP`. Unknown backends are registered as new plugins.
*
*/
/** Interface for parallel_for backends implementations
*
* @sa setParallelForBackend
*/
class CV_EXPORTS ParallelForAPI
{
public:
virtual ~ParallelForAPI();
typedef void (CV_API_CALL *FN_parallel_for_body_cb_t)(int start, int end, void* data);
virtual void parallel_for(int tasks, FN_parallel_for_body_cb_t body_callback, void* callback_data) = 0;
virtual int getThreadNum() const = 0;
virtual int getNumThreads() const = 0;
virtual int setNumThreads(int nThreads) = 0;
virtual const char* getName() const = 0;
};
/** @brief Replace OpenCV parallel_for backend
*
* Application can replace OpenCV `parallel_for()` backend with own implementation.
*
* @note This call is not thread-safe. Consider calling this function from the `main()` before any other OpenCV processing functions (and without any other created threads).
*/
CV_EXPORTS void setParallelForBackend(const std::shared_ptr<ParallelForAPI>& api, bool propagateNumThreads = true);
/** @brief Change OpenCV parallel_for backend
*
* @note This call is not thread-safe. Consider calling this function from the `main()` before any other OpenCV processing functions (and without any other created threads).
*/
CV_EXPORTS_W bool setParallelForBackend(const std::string& backendName, bool propagateNumThreads = true);
//! @}
}} // namespace
#endif // OPENCV_CORE_PARALLEL_BACKEND_HPP
可以看出opencv是支持后端切换的,并且提供了设置并行后端的接口, 从注释中还能看到TBB与openMP后端的example( parallel_backend/example-tbb.cpp和parallel_backend/example-openmp.cpp),以parallel_backend/example-tbb.cpp为例:
#include "opencv2/core.hpp"
#include <iostream>
#include <chrono>
#include <thread>
//! [tbb_include]
#include "opencv2/core/parallel/backend/parallel_for.tbb.hpp"
//! [tbb_include]
namespace cv { // private.hpp
CV_EXPORTS const char* currentParallelFramework();
}
static
std::string currentParallelFrameworkSafe()
{
const char* framework = cv::currentParallelFramework();
if (framework)
return framework;
return std::string();
}
using namespace cv;
int main()
{
std::cout << "OpenCV builtin parallel framework: '" << currentParallelFrameworkSafe() << "' (nthreads=" << getNumThreads() << ")" << std::endl;
//! [tbb_backend]
cv::parallel::setParallelForBackend(std::make_shared<cv::parallel::tbb::ParallelForBackend>());
//! [tbb_backend]
std::cout << "New parallel backend: '" << currentParallelFrameworkSafe() << "'" << "' (nthreads=" << getNumThreads() << ")" << std::endl;
parallel_for_(Range(0, 20), [&](const Range range)
{
std::ostringstream out;
out << "Thread " << getThreadNum() << "(opencv=" << utils::getThreadID() << "): range " << range.start << "-" << range.end << std::endl;
std::cout << out.str() << std::flush;
std::this_thread::sleep_for(std::chrono::milliseconds(100));
});
}
可以看到,我们不仅能设置并行后端,也能确认当前使用的是什么后端,按example的方法用就行。
另外,如果要使用TBB后端,通常需要在编译opencv时加上TBB的选项,编好的DLL使用时会默认使用TBB的并行后端,不过似乎在项目中直接设置好tbb的头文件和依赖库,再按example-tbb.cpp的方式设置tbb后端也能切到TBB的后端,各位看客可以自行尝试。
线程数量设置:
cv::setNumThreads(6);
如果不设置的话,默认是按CPU最大线程数,可能会把CPU占用率拉到较高水平,我使用的是i5-14600k,6大核8小核,共计20线程,默认状况下,发现并行效率会降低(前几次循环的耗时较低,后面耗时快速增加),不知道是啥原因,最后发现设置为6效果最好,刚好等于我CPU大核的数量。建议大家实际使用的时候,做一下长时间的重复测试, 选择最好的线程数量设置。
更新:我之前使用的是opencv4.7,当opencv线程数量设置较高存在降速的问题,最新编了一版opencv4.10,可以跑满线程数,而且没有降速,感兴趣的可以自己测一下不同的版本。