opencl:C++ 利用cl::make_kernel简化kernel执行代码

版权声明:本文为博主原创文章,转载请注明源地址。 https://blog.csdn.net/10km/article/details/50767201

上一篇博客《opencl:C++实现双线性插值图像缩放》中介绍了简单的图像缩放函数
代码片段如下,可以看到,为了执行kernel,先要设置kernel参数,然后调用enqueueNDRangeKernel执行kernel。

/* 缩放图像(双线性插值) 返回缩放后的图像数据*/
gray_matrix_cl gray_matrix_cl::zoom(size_t dst_width, size_t dst_height, const facecl_context& context)const {
    gray_matrix_cl dst_matrix(dst_width, dst_height);
    auto kernel = context.getKernel(KERNEL_NAME(image_scaling));// 获取已经编译好的cl::Kernel
    auto command_queue = global_facecl_context.getCommandQueue();// 获取cl::CommandQueue
    this->upload(command_queue);//向OpenCL设备中上传原始图像数据
    cl_float widthNormalizationFactor = 1.0f / dst_width;
    cl_float heightNormalizationFactor = 1.0f / dst_height;
    // 设置kernel参数
    kernel.setArg(0, cl_img);
    kernel.setArg(1, dst_matrix.cl_img);
    kernel.setArg(2, widthNormalizationFactor);
    kernel.setArg(3, heightNormalizationFactor);
    const cl::NDRange global(dst_width, dst_height);
    // 执行 kernel
    command_queue.enqueueNDRangeKernel(kernel, gray_matrix_cl::NullRange, global);
    command_queue.finish();// 等待kernel执行结束
    dst_matrix.download(command_queue);// 从OpenCL设备中下载结果数据
    return std::move(dst_matrix);
}

在上面的代码中,kernel中有几个参数,就有几行setArg,写着好烦呐,其实仔细研究opencl的C++接口,可以发现,cl.hpp中已经提供了cl::make_kernal模板算子(functor),用于简化kernel调用。
下面的代码将上述的zoom函数改为使用cl::make_kernel

/* 缩放图像(双线性插值) */
gray_matrix_cl gray_matrix_cl::zoom(size_t dst_width, size_t dst_height, const facecl_context& context)const {
    gray_matrix_cl dst_matrix(dst_width, dst_height);
    auto command_queue = global_facecl_context.getCommandQueue();// 获取cl::CommandQueue
    this->upload(command_queue);//向OpenCL设备中上传原始图像数据
    cl_float widthNormalizationFactor = 1.0f / dst_width;
    cl_float heightNormalizationFactor = 1.0f / dst_height;
    //构造cl::make_kernel对象执行kernel
    cl::make_kernel<cl::Image2D,cl::Image2D,cl_float,cl_float>
        (context.getKernel(KERNEL_NAME(image_scaling)))// 获取已经编译好的cl::Kernel
        (cl::EnqueueArgs(command_queue,cl::NDRange( dst_width, dst_height )),
        cl_img,dst_matrix.cl_img,
        widthNormalizationFactor,
        heightNormalizationFactor);
    command_queue.finish(); // 等待kernel执行结束
    dst_matrix.download(command_queue);从OpenCL设备中下载结果数据
    return std::move(dst_matrix);
}

这样仅用一条语句就完成了kernel参数设置和执行功能,减少代码出错的机会(貌似这条语句比较长呐,呵呵)。

下面是cl::make_kernel构造函数的说明

/*
创建一个具有最少一个最多32个参数的kernal算子(functor)
T0 到 T31 是kernel的参数类型(顺序与kernel函数的参数申明顺序一致) 
program 为定义了kernel的cl::Program对象.
name is the name of the kernel functor.//kernel 名字
err 如果err不为NULL,出错时返回错误代码.
*/

template <typename T0, typename T1 = detail::NullType, typename T2 = detail::NullType,
typename T3 = detail::NullType, typename T4 = detail::NullType, typename T5 = detail::NullType,
typename T6 = detail::NullType, typename T7 = detail::NullType, typename T8 = detail::NullType,
typename T9 = detail::NullType, typename T10 = detail::NullType, typename T11 = detail::NullType,
typename T12 = detail::NullType, typename T13 = detail::NullType, typename T14 = detail::NullType,
typename T15 = detail::NullType, typename T16 = detail::NullType, typename T17 = detail::NullType,
typename T18 = detail::NullType, typename T19 = detail::NullType, typename T20 = detail::NullType,
typename T21 = detail::NullType, typename T22 = detail::NullType, typename T23 = detail::NullType,
typename T24 = detail::NullType, typename T25 = detail::NullType, typename T26 = detail::NullType,
typename T27 = detail::NullType, typename T28 = detail::NullType, typename T29 = detail::NullType,
typename T30 = detail::NullType, typename T31 = detail::NullType>
struct make_kernel :: detail::functionImplementation<T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10,
T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, T31>
cl::make_kernel::make_kernel(
const Program &program,
const STRING_CLASS name,
cl_int *err = NULL)
/* 与前一个构造函数差不多
   只是调用由cl::Program,STRING_CLASS 参数变为cl::Kernel.
*/
template <typename T0, typename T1 = detail::NullType, typename T2 = detail::NullType,
typename T3 = detail::NullType, typename T4 = detail::NullType, typename T5 = detail::NullType,
typename T6 = detail::NullType, typename T7 = detail::NullType, typename T8 = detail::NullType,
typename T9 = detail::NullType, typename T10 = detail::NullType, typename T11 = detail::NullType,
typename T12 = detail::NullType, typename T13 = detail::NullType, typename T14 = detail::NullType,
typename T15 = detail::NullType, typename T16 = detail::NullType, typename T17 = detail::NullType,
typename T18 = detail::NullType, typename T19 = detail::NullType, typename T20 = detail::NullType,
typename T21 = detail::NullType, typename T22 = detail::NullType, typename T23 = detail::NullType,
typename T24 = detail::NullType, typename T25 = detail::NullType, typename T26 = detail::NullType,
typename T27 = detail::NullType, typename T28 = detail::NullType, typename T29 = detail::NullType,
typename T30 = detail::NullType, typename T31 = detail::NullType>
struct make_kernel :: detail::functionImplementation<T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10,
T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, T31>
cl::make_kernel::make_kernel(
const Kernel kernel,
cl_int *err = NULL)

cl::make_kernel 还重载了()操作符用于kernel调用,格式如下。

Event operator() ( EnqueueArgs& args,
T0 t0, T1 t1 = NullType, …, T31 t31 = NullType)
Event operator() ( EnqueueArgs& args,
const Event& waitEvent,
T0 t0, T1 t1 = NullType, …, T31 t31 = NullType )
Event operator() ( EnqueueArgs &args,
const VECTOR_CLASS<Event>& waitEvents,
T0 t0, T1 t1 = NullType, …, T31 t31 = NullType )

前面修改后的zoom函数中如下代码就是调用operator()操作符的参数

(cl::EnqueueArgs(command_queue,cl::NDRange( dst_width, dst_height )),
        cl_img,dst_matrix.cl_img,
        widthNormalizationFactor,
        heightNormalizationFactor)

另外上面这段代码中 cl::EnqueueArgs(command_queue,cl::NDRange( dst_width, dst_height ))部分
用到了cl::EnqueueArgs类,
cl:EnqueueArgs类参数化参数调度。下面列出了其构造函数的正交重载允许调度参数和参数计算仿函数。如果传递一个事件,EnqueueArgs 为enquque构造一个事件的列表。如果传递一个向量的事件,它构造一个输入事件依赖项列表。参数调度发生通过默认队列或指定的队列。

The constructors for EnqueueArgs are:
cl::EnqueueArgs::EnqueueArgs(NDRange global)
cl::EnqueueArgs::EnqueueArgs(NDRange global, NDRange local)
cl::EnqueueArgs::EnqueueArgs(NDRange offset, NDRange global, NDRange local)
cl::EnqueueArgs::EnqueueArgs(Event e, NDRange global)
cl::EnqueueArgs::EnqueueArgs(Event e, NDRange global, NDRange local)
cl::EnqueueArgs::EnqueueArgs(Event e, NDRange offset, NDRange global, NDRange local)
cl::EnqueueArgs::EnqueueArgs(const VECTOR_CLASS<Event> &events, NDRange global)
cl::EnqueueArgs::EnqueueArgs(const VECTOR_CLASS<Event> &events, NDRange global,
NDRange local)
cl::EnqueueArgs::EnqueueArgs(const VECTOR_CLASS<Event> &events, NDRange offset,
NDRange global, NDRange local)
cl::EnqueueArgs::EnqueueArgs(CommandQueue &queue, NDRange global)
cl::EnqueueArgs::EnqueueArgs(CommandQueue &queue, NDRange global, NDRange local)
cl::EnqueueArgs::EnqueueArgs(CommandQueue &queue, NDRange offset, NDRange global,
NDRange local)
cl::EnqueueArgs::EnqueueArgs(CommandQueue &queue, Event e, NDRange global)
cl::EnqueueArgs::EnqueueArgs(CommandQueue &queue, Event e, NDRange global,
NDRange local)
cl::EnqueueArgs::EnqueueArgs(CommandQueue &queue, Event e, NDRange offset,
NDRange global, NDRange local)
cl::EnqueueArgs::EnqueueArgs(CommandQueue &queue,
const VECTOR_CLASS<Event> &events, NDRange global)
cl::EnqueueArgs::EnqueueArgs(CommandQueue &queue,
const VECTOR_CLASS<Event> &events, NDRange global, NDRange local)
cl::EnqueueArgs::EnqueueArgs(CommandQueue &queue,
const VECTOR_CLASS<Event> &events, NDRange offset, NDRange global,
NDRange local)
global is a global work size corresponding to the global_work_size argument of the underlying OpenCL EnqueueNDRangeKernel call.
local is a local work size corresponding to the local_work_size argument of the underlying OpenCL EnqueueNDRangeKernel call. If local is not specified, a NULL local_work_size is used.
offset is an offset corresponding to the global_work_offset argument of the underlying OpenCL EnqueueNDRangeKernel call. If offset is not specified, a NULL global_work_offset is used.
e is an Event that must be completed before the EnqueueArgs may be executed, and similarly
events is a list of events that must be completed before the EnqueueArgs may be executed. If neither e nor events is specified, the EnqueueArgs is executed without waiting on any events.
queue is a CommandQueue to which the EnqueueArgs is submitted. If queue is not specified, EnqueueArgs is submitted to the default queue.

本文所有opencl的函数说明来自opencl官方文档:opencl-cplusplus-1.2.pdf


关于对cl::make_kernel调用方法的进一步改进,参见我的另一篇博客《opencl:cl::make_kernel的进化》

阅读更多

没有更多推荐了,返回首页