OpenCV并行模块cv::parallel_for_选择并行后端

项目场景:

        项目使用到了opencv进行图像处理,其中有个函数需要对一个std::vector<cv::Mat>里面的图像进行图像处理,但是单张图像耗时较长导致整体处理耗时较长,此时可以用多线程的方式加快处理速度,偶然发现opencv有一个cv::parallel_for_模块可以快速实现这个效果。


使用方法

         原始代码大概长这样:

std::vector<cv::Mat> res(images_path.size());
for (int i = 0; i < images_path.size(); ++i)
{
	auto path = images_path[i];
	cv::Mat img = cv::imread(path, 0);
	res[i] = img;
}

         并行后:

#include <opencv2/core/parallel/parallel_backend.hpp>
#include <opencv2/core/utility.hpp>
#include <opencv2/core.hpp>

std::vector<cv::Mat> res(images_path.size());
cv::parallel_for_(cv::Range(0, images_path.size()), [&](const cv::Range& range) {
	for (int i = range.start; i < range.end; ++i)
	{
		auto path = images_path[i];
		cv::Mat img = cv::imread(path, 0);
		res[i] = bmp;
	}
});

小tips:

          最简单的cv::parallel_for_就用起来了,但是这里存在两个不算问题的小问题:1)cv::parallel_for_的并行后端包含openMP,TBB,和最原生的多线程,通常博客会写到opencv会自动选择并行后端,但是如何进行呢?2)并行的线程数是不是越高越好?


后端选择:

        根据 <opencv2/core/parallel/parallel_backend.hpp>的内容:

// This file is part of OpenCV project.
// It is subject to the license terms in the LICENSE file found in the top-level directory
// of this distribution and at http://opencv.org/license.html.

#ifndef OPENCV_CORE_PARALLEL_BACKEND_HPP
#define OPENCV_CORE_PARALLEL_BACKEND_HPP

#include "opencv2/core/cvdef.h"
#include <memory>

namespace cv { namespace parallel {
#ifndef CV_API_CALL
#define CV_API_CALL
#endif

/** @addtogroup core_parallel_backend
 * @{
 * API below is provided to resolve problem of CPU resource over-subscription by multiple thread pools from different multi-threading frameworks.
 * This is common problem for cases when OpenCV compiled threading framework is different from the Users Applications framework.
 *
 * Applications can replace OpenCV `parallel_for()` backend with own implementation (to reuse Application's thread pool).
 *
 *
 * ### Backend API usage examples
 *
 * #### Intel TBB
 *
 * - include header with simple implementation of TBB backend:
 *   @snippet parallel_backend/example-tbb.cpp tbb_include
 * - execute backend replacement code:
 *   @snippet parallel_backend/example-tbb.cpp tbb_backend
 * - configuration of compiler/linker options is responsibility of Application's scripts
 *
 * #### OpenMP
 *
 * - include header with simple implementation of OpenMP backend:
 *   @snippet parallel_backend/example-openmp.cpp openmp_include
 * - execute backend replacement code:
 *   @snippet parallel_backend/example-openmp.cpp openmp_backend
 * - Configuration of compiler/linker options is responsibility of Application's scripts
 *
 *
 * ### Plugins support
 *
 * Runtime configuration options:
 * - change backend priority: `OPENCV_PARALLEL_PRIORITY_<backend>=9999`
 * - disable backend: `OPENCV_PARALLEL_PRIORITY_<backend>=0`
 * - specify list of backends with high priority (>100000): `OPENCV_PARALLEL_PRIORITY_LIST=TBB,OPENMP`. Unknown backends are registered as new plugins.
 *
 */

/** Interface for parallel_for backends implementations
 *
 * @sa setParallelForBackend
 */
class CV_EXPORTS ParallelForAPI
{
public:
    virtual ~ParallelForAPI();

    typedef void (CV_API_CALL *FN_parallel_for_body_cb_t)(int start, int end, void* data);

    virtual void parallel_for(int tasks, FN_parallel_for_body_cb_t body_callback, void* callback_data) = 0;

    virtual int getThreadNum() const = 0;

    virtual int getNumThreads() const = 0;

    virtual int setNumThreads(int nThreads) = 0;

    virtual const char* getName() const = 0;
};

/** @brief Replace OpenCV parallel_for backend
 *
 * Application can replace OpenCV `parallel_for()` backend with own implementation.
 *
 * @note This call is not thread-safe. Consider calling this function from the `main()` before any other OpenCV processing functions (and without any other created threads).
 */
CV_EXPORTS void setParallelForBackend(const std::shared_ptr<ParallelForAPI>& api, bool propagateNumThreads = true);

/** @brief Change OpenCV parallel_for backend
 *
 * @note This call is not thread-safe. Consider calling this function from the `main()` before any other OpenCV processing functions (and without any other created threads).
 */
CV_EXPORTS_W bool setParallelForBackend(const std::string& backendName, bool propagateNumThreads = true);

//! @}
}}  // namespace
#endif  // OPENCV_CORE_PARALLEL_BACKEND_HPP

        可以看出opencv是支持后端切换的,并且提供了设置并行后端的接口, 从注释中还能看到TBB与openMP后端的example( parallel_backend/example-tbb.cpp和parallel_backend/example-openmp.cpp),以parallel_backend/example-tbb.cpp为例:

#include "opencv2/core.hpp"
#include <iostream>

#include <chrono>
#include <thread>

//! [tbb_include]
#include "opencv2/core/parallel/backend/parallel_for.tbb.hpp"
//! [tbb_include]

namespace cv { // private.hpp
CV_EXPORTS const char* currentParallelFramework();
}

static
std::string currentParallelFrameworkSafe()
{
    const char* framework = cv::currentParallelFramework();
    if (framework)
        return framework;
    return std::string();
}

using namespace cv;
int main()
{
    std::cout << "OpenCV builtin parallel framework: '" << currentParallelFrameworkSafe() << "' (nthreads=" << getNumThreads() << ")" << std::endl;

    //! [tbb_backend]
    cv::parallel::setParallelForBackend(std::make_shared<cv::parallel::tbb::ParallelForBackend>());
    //! [tbb_backend]

    std::cout << "New parallel backend: '" << currentParallelFrameworkSafe() << "'" << "' (nthreads=" << getNumThreads() << ")" << std::endl;

    parallel_for_(Range(0, 20), [&](const Range range)
    {
        std::ostringstream out;
        out << "Thread " << getThreadNum() << "(opencv=" << utils::getThreadID() << "): range " << range.start << "-" << range.end << std::endl;
        std::cout << out.str() << std::flush;

        std::this_thread::sleep_for(std::chrono::milliseconds(100));
    });
}

         可以看到,我们不仅能设置并行后端,也能确认当前使用的是什么后端,按example的方法用就行。

        另外,如果要使用TBB后端,通常需要在编译opencv时加上TBB的选项,编好的DLL使用时会默认使用TBB的并行后端,不过似乎在项目中直接设置好tbb的头文件和依赖库,再按example-tbb.cpp的方式设置tbb后端也能切到TBB的后端,各位看客可以自行尝试。


线程数量设置:

cv::setNumThreads(6);

        如果不设置的话,默认是按CPU最大线程数,可能会把CPU占用率拉到较高水平,我使用的是i5-14600k,6大核8小核,共计20线程,默认状况下,发现并行效率会降低(前几次循环的耗时较低,后面耗时快速增加),不知道是啥原因,最后发现设置为6效果最好,刚好等于我CPU大核的数量。建议大家实际使用的时候,做一下长时间的重复测试, 选择最好的线程数量设置。

        更新:我之前使用的是opencv4.7,当opencv线程数量设置较高存在降速的问题,最新编了一版opencv4.10,可以跑满线程数,而且没有降速,感兴趣的可以自己测一下不同的版本。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值