OpenCV并行模块cv::parallel_for_选择并行后端

我记得呢

已于 2024-09-11 16:51:07 修改

阅读量481

点赞数 4

文章标签： opencv

于 2024-09-09 15:27:44 首次发布

本文链接：https://blog.csdn.net/qq_38429284/article/details/142058620

版权

项目场景：

项目使用到了opencv进行图像处理，其中有个函数需要对一个std::vector<cv::Mat>里面的图像进行图像处理，但是单张图像耗时较长导致整体处理耗时较长，此时可以用多线程的方式加快处理速度，偶然发现opencv有一个cv::parallel_for_模块可以快速实现这个效果。

使用方法

原始代码大概长这样：

std::vector<cv::Mat> res(images_path.size());
for (int i = 0; i < images_path.size(); ++i)
{
	auto path = images_path[i];
	cv::Mat img = cv::imread(path, 0);
	res[i] = img;
}

并行后：

#include <opencv2/core/parallel/parallel_backend.hpp>
#include <opencv2/core/utility.hpp>
#include <opencv2/core.hpp>

std::vector<cv::Mat> res(images_path.size());
cv::parallel_for_(cv::Range(0, images_path.size()), [&](const cv::Range& range) {
	for (int i = range.start; i < range.end; ++i)
	{
		auto path = images_path[i];
		cv::Mat img = cv::imread(path, 0);
		res[i] = bmp;
	}
});

小tips：

最简单的cv::parallel_for_就用起来了，但是这里存在两个不算问题的小问题：1）cv::parallel_for_的并行后端包含openMP，TBB，和最原生的多线程，通常博客会写到opencv会自动选择并行后端，但是如何进行呢？2）并行的线程数是不是越高越好？

后端选择：

根据 <opencv2/core/parallel/parallel_backend.hpp>的内容：

// This file is part of OpenCV project.
// It is subject to the license terms in the LICENSE file found in the top-level directory
// of this distribution and at http://opencv.org/license.html.

#ifndef OPENCV_CORE_PARALLEL_BACKEND_HPP
#define OPENCV_CORE_PARALLEL_BACKEND_HPP

#include "opencv2/core/cvdef.h"
#include <memory>

namespace cv { namespace parallel {
#ifndef CV_API_CALL
#define CV_API_CALL
#endif

/** @addtogroup core_parallel_backend
 * @{
 * API below is provided to resolve problem of CPU resource over-subscription by multiple thread pools from different multi-threading frameworks.
 * This is common problem for cases when OpenCV compiled threading framework is different from the Users Applications framework.
 *
 * Applications can replace OpenCV `parallel_for()` backend with own implementation (to reuse Application's thread pool).
 *
 *
 * ### Backend API usage examples
 *
 * #### Intel TBB
 *
 * - include header with simple implementation of TBB backend:
 *   @snippet parallel_backend/example-tbb.cpp tbb_include
 * - execute backend replacement code:
 *   @snippet parallel_backend/example-tbb.cpp tbb_backend
 * - configuration of compiler/linker options is responsibility of Application's scripts
 *
 * #### OpenMP
 *
 * - include header with simple implementation of OpenMP backend:
 *   @snippet parallel_backend/example-openmp.cpp openmp_include
 * - execute backend replacement code:
 *   @snippet parallel_backend/example-openmp.cpp openmp_backend
 * - Configuration of compiler/linker options is responsibility of Application's scripts
 *
 *
 * ### Plugins support
 *
 * Runtime configuration options:
 * - change backend priority: `OPENCV_PARALLEL_PRIORITY_<backend>=9999`
 * - disable backend: `OPENCV_PARALLEL_PRIORITY_<backend>=0`
 * - specify list of backends with high priority (>100000): `OPENCV_PARALLEL_PRIORITY_LIST=TBB,OPENMP`. Unknown backends are registered as new plugins.
 *
 */

/** Interface for parallel_for backends implementations
 *
 * @sa setParallelForBackend
 */
class CV_EXPORTS ParallelForAPI
{
public:
    virtual ~ParallelForAPI();

    typedef void (CV_API_CALL *FN_parallel_for_body_cb_t)(int start, int end, void* data);

    virtual void parallel_for(int tasks, FN_parallel_for_body_cb_t body_callback, void* callback_data) = 0;

    virtual int getThreadNum() const = 0;

    virtual int getNumThreads() const = 0;

    virtual int setNumThreads(int nThreads) = 0;

    virtual const char* getName() const = 0;
};

/** @brief Replace OpenCV parallel_for backend
 *
 * Application can replace OpenCV `parallel_for()` backend with own implementation.
 *
 * @note This call is not thread-safe. Consider calling this function from the `main()` before any other OpenCV processing functions (and without any other created threads).
 */
CV_EXPORTS void setParallelForBackend(const std::shared_ptr<ParallelForAPI>& api, bool propagateNumThreads = true);

/** @brief Change OpenCV parallel_for backend
 *
 * @note This call is not thread-safe. Consider calling this function from the `main()` before any other OpenCV processing functions (and without any other created threads).
 */
CV_EXPORTS_W bool setParallelForBackend(const std::string& backendName, bool propagateNumThreads = true);

//! @}
}}  // namespace
#endif  // OPENCV_CORE_PARALLEL_BACKEND_HPP

可以看出opencv是支持后端切换的，并且提供了设置并行后端的接口，从注释中还能看到TBB与openMP后端的example（ parallel_backend/example-tbb.cpp和parallel_backend/example-openmp.cpp），以parallel_backend/example-tbb.cpp为例：

#include "opencv2/core.hpp"
#include <iostream>

#include <chrono>
#include <thread>

//! [tbb_include]
#include "opencv2/core/parallel/backend/parallel_for.tbb.hpp"
//! [tbb_include]

namespace cv { // private.hpp
CV_EXPORTS const char* currentParallelFramework();
}

static
std::string currentParallelFrameworkSafe()
{
    const char* framework = cv::currentParallelFramework();
    if (framework)
        return framework;
    return std::string();
}

using namespace cv;
int main()
{
    std::cout << "OpenCV builtin parallel framework: '" << currentParallelFrameworkSafe() << "' (nthreads=" << getNumThreads() << ")" << std::endl;

    //! [tbb_backend]
    cv::parallel::setParallelForBackend(std::make_shared<cv::parallel::tbb::ParallelForBackend>());
    //! [tbb_backend]

    std::cout << "New parallel backend: '" << currentParallelFrameworkSafe() << "'" << "' (nthreads=" << getNumThreads() << ")" << std::endl;

    parallel_for_(Range(0, 20), [&](const Range range)
    {
        std::ostringstream out;
        out << "Thread " << getThreadNum() << "(opencv=" << utils::getThreadID() << "): range " << range.start << "-" << range.end << std::endl;
        std::cout << out.str() << std::flush;

        std::this_thread::sleep_for(std::chrono::milliseconds(100));
    });
}

可以看到，我们不仅能设置并行后端，也能确认当前使用的是什么后端，按example的方法用就行。

另外，如果要使用TBB后端，通常需要在编译opencv时加上TBB的选项，编好的DLL使用时会默认使用TBB的并行后端，不过似乎在项目中直接设置好tbb的头文件和依赖库，再按example-tbb.cpp的方式设置tbb后端也能切到TBB的后端，各位看客可以自行尝试。

线程数量设置：

cv::setNumThreads(6);

如果不设置的话，默认是按CPU最大线程数，可能会把CPU占用率拉到较高水平，我使用的是i5-14600k，6大核8小核，共计20线程，默认状况下，发现并行效率会降低（前几次循环的耗时较低，后面耗时快速增加），不知道是啥原因，最后发现设置为6效果最好，刚好等于我CPU大核的数量。建议大家实际使用的时候，做一下长时间的重复测试，选择最好的线程数量设置。

更新：我之前使用的是opencv4.7，当opencv线程数量设置较高存在降速的问题，最新编了一版opencv4.10，可以跑满线程数，而且没有降速，感兴趣的可以自己测一下不同的版本。

我记得呢

关注

4
点赞
踩
11

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫