Mali OpenCL SDK v1.1.0 教程样例之六“索贝尔滤波器”

最新推荐文章于 2023-02-16 11:42:27 发布

SIGES

最新推荐文章于 2023-02-16 11:42:27 发布

阅读量1.3k

点赞数

Sobel图像滤波器是一种简单的卷积滤波器，主要用于边缘检测算法。

样例结果

输入图像

输出图像

算法

　　一种做图像边缘检测的技术是，找出图像的梯度。大梯度值的区域对应图像色彩或密度的剧变区域。典型地，这些区域是边缘。

　　如果你对于幅图像卷积两个Sobel算子，你会得到两个输出：

　> X方向的梯度(dX)

　> Y方向的梯度(dY)

　　通常梯度被合并，给出一个总的梯度图。

　　Sobel算子：dX与dY：

[cpp]view plaincopyprint? 
   
 -1  0  1  
 -2  0  2  
 -1  0  1  

[cpp]view plaincopyprint? 
   
  1  2  1  
  0  0  0  
 -1 -2 -1  

实现

填充

　　值得注意的是，我们在此并不考虑填充。在两个维度，输出图像相比小两个像素。因为，每个输出需要围绕它的像素，计算边沿像素的输出是不可能的。在此样例中，我们简单地把边缘输出像素用其初始值。

　　有时需要输出信号的大小与输入信号的大小一样，这种种情况下，"填充"必须应用到输入中，考虑到滤波应用的实际本性上是减少大小的。填充策略会有不同，但是对于图像，通常的选择是在所有边上重复边缘值(例如，最外层的像素集)，或者在一些边上(在某些情况下)。

实现细节

　　简单起见，我们在一个8比特通道上实现Sobel滤波器。为了在RGB图像上做Sobel滤波，你可以独立地在每个通道上运行Sobel滤波，然后组合结果。在这个例子中，我们取了一张RGB图像，转换成8位辉度图，然后发送到GPU。

　　每个Sobel计算给出一个中心掩码像素的输出。中心像素的输出值是围绕像素的3x3网格上的像素值乘以Sobel掩码的和。这可以被分成三个阶段，通过为网格上的每一行独立地做乘法和累加(这一方法被使用在这个样例中)。

代码

　　注意：我们考虑矢量化是在本教程中使用的主要优化技术。为了显示使用适量的重要性，一个不包含矢量的Sobel滤波器的OpenCL实现被包含在sobel_no_vectors sample(sobel_no_vectors.cpp和sobel_no_vectors.cl)。在Mali-T600系列GPU上，矢量化的版本快很多。

　　除非另作说明，否则所有的代码片段来自sobel.cl中的OpenCL内核。

1. 选择内核大小

　　我们在内核中使用向量类型，因此我们实际上每个内核输出16个结果。看下面更多的向量化细节。内核使用3x3的Sobel滤波器到一个输入图像中的18x3的窗，在两个输出图像中来生成16x1结果，代表梯度的dx和dy。

[cpp]view plaincopyprint? 
   
 /* 
  * Each kernel calculates 16 output pixels in the same row (hence the '* 16'). 
  * column is in the range [0, width] in steps of 16. 
  * row is in the range [0, height]. 
  */  
 const int column = get_global_id(0) * 16;  
 const int row = get_global_id(1) * 1;  
 /* Offset calculates the position in the linear data for the row and the column. */  
 const int offset = row * width + column;  

　　当我们在sobel.cpp中排队内核时，相应地减少工作大小：

[cpp]view plaincopyprint? 
   
 /* 
  * Each instance of the kernel operates on a 16 * 1 portion of the image. 
  * Therefore, the global work size must be width / 16 by height / 1 work items. 
  */  
 size_t globalWorksize[2] = {width / 16, height / 1};  

2. 加载输入数据

　　Mali-T600系列GPU具有128位矢量寄存器，可以在矢量类型上做运算。因此，我们使用OpenCL矢量，来使得更加有效地使用硬件，从而有更高的性能。

　　这里我们从一行数据中做矢量加载：

[cpp]view plaincopyprint? 
   
 /* 
  * First row of input. 
  * In a scalar Sobel calculation you would load 1 value for leftLoad, middleLoad and rightLoad. 
  * In the vector case we load 16 values for each. 
  * leftLoad, middleLoad and rightLoad load 16-bytes of data from the first row. 
  * The data they load overlaps. e.g. for the first column and row, leftLoad is 0->15, middleLoad is 1->16 and rightLoad is 2->17. 
  * So we're actually loading 18-bytes of data from the first row. 
  */  
 uchar16 leftLoad = vload16(0, inputImage + (offset + 0));  
 uchar16 middleLoad = vload16(0, inputImage + (offset + 1));  
 uchar16 rightLoad = vload16(0, inputImage + (offset + 2));  

3. 转换数据

　　在Mali-T600系列GPU上，扩展和收缩数据类型是一个很容易地操作。在此，我们将8比特每像素数据转换成16比特每像素。

[cpp]view plaincopyprint? 
   
 /* 
  * Convert the data from unsigned chars to shorts (8-bit unsigned to 16-bit signed). 
  * The calculations can overflow 8-bits so we require larger intermediate storage. 
  * Additionally, the values can become negative so we need a signed type. 
  */  
 short16 leftData = convert_short16(leftLoad);  
 short16 middleData = convert_short16(middleLoad);  
 short16 rightData = convert_short16(rightLoad);  

4. 做计算

　　然后，我们在计算16个像素的结果。在Mali-T600系列GPU上，每个向量计算可以作为一个单一操作完成。

[cpp]view plaincopyprint? 
   
 /* 
  * Calculate the results for the first row. 
  * Looking at the Sobel masks above for the first line of input, 
  * the dX calculation is the sum of 1 * leftData, 0 * middleData, and -1 * rightData. 
  * The dY calculation is the sum of 1 * leftData, 2 * middleData, and 1 * rightData. 
  * This is what is being calculated below, except we have removed the 
  * unnecessary calculations (multiplications by 1 or 0) and we are calculating 16 values at once. 
  * This pattern repeats for the other 2 rows of data. 
  */  
 short16 dx = rightData - leftData;  
 short16 dy = rightData + leftData + middleData * (short)2;  

5. 排列结果

　　最后，我们收缩和存储数据。我们使用一个向量存储一次写16个结果：

[cpp]view plaincopyprint? 
   
 /* 
  * Store the results. 
  * The range of outputs from our Sobel calculations is [-1020, 1020]. 
  * In order to output this as an 8-bit signed char we must divide it by 8 (or shift right 3 times). 
  * This gives the range [-128, 128]. Depending on what type of output you require, 
  * (signed/unsigned, seperate/combined gradients) it is possible to do more of the calculations on the GPU using OpenCL. 
  * In this sample we're assuming that the application requires signed uncombined gradient outputs. 
  */  
 vstore16(convert_char16(dx >> 3), 0, outputImageDX + offset + width + 1);  
 vstore16(convert_char16(dy >> 3), 0, outputImageDY + offset + width + 1);  

　　因为数据作为两个分开的梯度图返回，我们在CPU上将它们先组合，然后将它们作为一个位图写入。

运行样例

　　参见样例1。

附录1：内核源码

[cpp]view plaincopyprint? 
   
 /* 
  * This confidential and proprietary software may be used only as 
  * authorised by a licensing agreement from ARM Limited 
  *    (C) COPYRIGHT 2013 ARM Limited 
  *        ALL RIGHTS RESERVED 
  * The entire notice above must be reproduced on all authorised 
  * copies and copies may only be made to the extent permitted 
  * by a licensing agreement from ARM Limited. 
  */  
   
 /** 
  * \brief Sobel filter kernel function. 
  * \param[in] inputImage Input image data in row-major format. 
  * \param[in] width Width of the image passed in as inputImage. 
  * \param[out] outputImageDX Output image of the calculated gradient in the X direction. 
  * \param[out] outputImageDY Output image of the calculated gradient in the Y direction. 
  */  
 __kernel void sobel(__global const uchar* restrict inputImage,  
                     const int width,  
                     __global char* restrict outputImageDX,  
                     __global char* restrict outputImageDY)  
 {  
     /* [Kernel size] */  
     /* 
      * Each kernel calculates 16 output pixels in the same row (hence the '* 16'). 
      * column is in the range [0, width] in steps of 16. 
      * row is in the range [0, height]. 
      */  
     const int column = get_global_id(0) * 16;  
     const int row = get_global_id(1) * 1;  
   
     /* Offset calculates the position in the linear data for the row and the column. */  
     const int offset = row * width + column;  
     /* [Kernel size] */  
   
     /* [Load row] */  
     /* 
      * First row of input. 
      * In a scalar Sobel calculation you would load 1 value for leftLoad, middleLoad and rightLoad. 
      * In the vector case we load 16 values for each. 
      * leftLoad, middleLoad and rightLoad load 16-bytes of data from the first row. 
      * The data they load overlaps. e.g. for the first column and row, leftLoad is 0->15, middleLoad is 1->16 and rightLoad is 2->17. 
      * So we're actually loading 18-bytes of data from the first row. 
      */  
     uchar16 leftLoad = vload16(0, inputImage + (offset + 0));  
     uchar16 middleLoad = vload16(0, inputImage + (offset + 1));  
     uchar16 rightLoad = vload16(0, inputImage + (offset + 2));  
     /* [Load row] */  
   
     /* [Convert data] */  
     /* 
      * Convert the data from unsigned chars to shorts (8-bit unsigned to 16-bit signed). 
      * The calculations can overflow 8-bits so we require larger intermediate storage. 
      * Additionally, the values can become negative so we need a signed type. 
      */  
     short16 leftData = convert_short16(leftLoad);  
     short16 middleData = convert_short16(middleLoad);  
     short16 rightData = convert_short16(rightLoad);  
     /* [Convert data] */  
   
     /* [Calculation] */  
     /* 
      * Calculate the results for the first row. 
      * Looking at the Sobel masks above for the first line of input, 
      * the dX calculation is the sum of 1 * leftData, 0 * middleData, and -1 * rightData. 
      * The dY calculation is the sum of 1 * leftData, 2 * middleData, and 1 * rightData. 
      * This is what is being calculated below, except we have removed the 
      * unnecessary calculations (multiplications by 1 or 0) and we are calculating 16 values at once. 
      * This pattern repeats for the other 2 rows of data. 
      */  
     short16 dx = rightData - leftData;  
     short16 dy = rightData + leftData + middleData * (short)2;  
     /* [Calculation] */  
   
     /* 
      * Second row of input. 
      * By adding the 'width * 1' to the offset we get the next row of data at the same column position. 
      * middleData is not loaded because it is not used in any of the calculations. 
      */  
     leftLoad = vload16(0, inputImage + (offset + width * 1 + 0));  
     rightLoad = vload16(0, inputImage + (offset + width * 1 + 2));  
   
     leftData = convert_short16(leftLoad);  
     rightData = convert_short16(rightLoad);  
   
     /* 
      * Calculate the results for the second row. 
      * The dX calculation is the sum of -2 * leftData, 0 * middleData, and -2 * rightData. 
      * There is no dY calculation to do: sum of 0 * leftData, 0 * middleData, and 0 * rightData. 
      */  
     dx += (rightData - leftData) * (short)2;  
   
     /* Third row of input. */  
     leftLoad = vload16(0, inputImage + (offset + width * 2 + 0));  
     middleLoad = vload16(0, inputImage + (offset + width * 2 + 1));  
     rightLoad = vload16(0, inputImage + (offset + width * 2 + 2));  
   
     leftData = convert_short16(leftLoad);  
     middleData = convert_short16(middleLoad);  
     rightData = convert_short16(rightLoad);  
   
     /* 
      * Calculate the results for the third row. 
      * The dX calculation is the sum of -1 * leftData, 0 * middleData, and -1 * rightData. 
      * The dY calculation is the sum of -1 * leftData, -2 * middleData, and -1 * rightData. 
      */  
     dx += rightData - leftData;  
     dy -= rightData + leftData + middleData * (short)2;  
   
     /* [Store] */  
     /* 
      * Store the results. 
      * The range of outputs from our Sobel calculations is [-1020, 1020]. 
      * In order to output this as an 8-bit signed char we must divide it by 8 (or shift right 3 times). 
      * This gives the range [-128, 128]. Depending on what type of output you require, 
      * (signed/unsigned, seperate/combined gradients) it is possible to do more of the calculations on the GPU using OpenCL. 
      * In this sample we're assuming that the application requires signed uncombined gradient outputs. 
      */  
     vstore16(convert_char16(dx >> 3), 0, outputImageDX + offset + width + 1);  
     vstore16(convert_char16(dy >> 3), 0, outputImageDY + offset + width + 1);  
     /* [Store] */  
 }  

附录2：宿主机源码

[cpp]view plaincopyprint? 
   
 /* 
  * This confidential and proprietary software may be used only as 
  * authorised by a licensing agreement from ARM Limited 
  *    (C) COPYRIGHT 2013 ARM Limited 
  *        ALL RIGHTS RESERVED 
  * The entire notice above must be reproduced on all authorised 
  * copies and copies may only be made to the extent permitted 
  * by a licensing agreement from ARM Limited. 
  */  
   
 #include "common.h"  
 #include "image.h"  
   
 #include <CL/cl.h>  
 #include <iostream>  
 #include <fstream>  
 #include <sstream>  
 #include <cstddef>  
 #include <cmath>  
   
 using namespace std;  
   
 /** 
  * \brief Simple Sobel filter OpenCL sample. 
  * \details A sample which loads a bitmap and then passes it to the GPU. 
  *          An OpenCL kernel which does Sobel filtering is then run on the data. 
  *          The gradients of the image in x and y directions are returned by the GPU 
  *          and are combined on the CPU to form the filtered data. 
  *          The input image is loaded from assets/input.bmp. The output gradients in X and Y, 
  *          as well as the combined gradient image are stored in output-dX.bmp, output-dY.bmp 
  *          and output.bmp respectively. 
  * \return The exit code of the application, non-zero if a problem occurred. 
  */  
 int main(void)  
 {  
     /* 
      * Name of the bitmap to load and run the sobel filter on. 
      * It's width must be divisible by 16. 
      */  
     string filename = "assets/input.bmp";  
   
     cl_context context = 0;  
     cl_command_queue commandQueue = 0;  
     cl_program program = 0;  
     cl_device_id device = 0;  
     cl_kernel kernel = 0;  
     const unsigned int numberOfMemoryObjects = 3;  
     cl_mem memoryObjects[numberOfMemoryObjects] = {0, 0, 0};  
     cl_int errorNumber;  
   
     if (!createContext(&context))  
     {  
         cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numberOfMemoryObjects);  
         cerr << "Failed to create an OpenCL context. " << __FILE__ << ":"<< __LINE__ << endl;  
         return 1;  
     }  
   
     if (!createCommandQueue(context, &commandQueue, &device))  
     {  
         cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numberOfMemoryObjects);  
         cerr << "Failed to create the OpenCL command queue. " << __FILE__ << ":"<< __LINE__ << endl;  
         return 1;  
     }  
   
     if (!createProgram(context, device, "assets/sobel.cl", &program))  
     {  
         cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numberOfMemoryObjects);  
         cerr << "Failed to create OpenCL program." << __FILE__ << ":"<< __LINE__ << endl;  
         return 1;  
     }  
   
     kernel = clCreateKernel(program, "sobel", &errorNumber);  
     if (!checkSuccess(errorNumber))  
     {  
         cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numberOfMemoryObjects);  
         cerr << "Failed to create OpenCL kernel. " << __FILE__ << ":"<< __LINE__ << endl;  
         return 1;  
     }  
   
     /* Load 24-bits per pixel RGB data from a bitmap. */  
     cl_int width;  
     cl_int height;  
     unsigned char* imageData = NULL;  
     if (!loadFromBitmap(filename, &width, &height, &imageData))  
     {  
         cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numberOfMemoryObjects);  
         cerr << "Failed loading bitmap. " << __FILE__ << ":"<< __LINE__ << endl;  
         return 1;  
     }  
   
     /* All buffers are the size of the image data. */  
     size_t bufferSize = width * height * sizeof(cl_uchar);  
   
     bool createMemoryObjectsSuccess = true;  
     /* Create one input buffer for the image data, and two output buffers for the gradients in X and Y respectively. */  
     memoryObjects[0] = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR, bufferSize, NULL, &errorNumber);  
     createMemoryObjectsSuccess &= checkSuccess(errorNumber);  
     memoryObjects[1] = clCreateBuffer(context, CL_MEM_WRITE_ONLY | CL_MEM_ALLOC_HOST_PTR, bufferSize, NULL, &errorNumber);  
     createMemoryObjectsSuccess &= checkSuccess(errorNumber);  
     memoryObjects[2] = clCreateBuffer(context, CL_MEM_WRITE_ONLY | CL_MEM_ALLOC_HOST_PTR, bufferSize, NULL, &errorNumber);  
     createMemoryObjectsSuccess &= checkSuccess(errorNumber);  
     if (!createMemoryObjectsSuccess)  
     {  
         cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numberOfMemoryObjects);  
         cerr << "Failed to create OpenCL buffers. " << __FILE__ << ":"<< __LINE__ << endl;  
         return 1;  
     }  
   
     /* Map the input luminance memory object to a host side pointer. */  
     cl_uchar* luminance = (cl_uchar*)clEnqueueMapBuffer(commandQueue, memoryObjects[0], CL_TRUE, CL_MAP_WRITE, 0, bufferSize, 0, NULL, NULL, &errorNumber);  
     if (!checkSuccess(errorNumber))  
     {  
        cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numberOfMemoryObjects);  
        cerr << "Mapping memory objects failed " << __FILE__ << ":"<< __LINE__ << endl;  
        return 1;  
     }  
   
     /* Convert 24-bits per pixel RGB into 8-bits per pixel luminance data. */  
     RGBToLuminance(imageData, luminance, width, height);  
   
     delete [] imageData;  
   
     /* Unmap the memory so we can pass it to the kernel. */  
     if (!checkSuccess(clEnqueueUnmapMemObject(commandQueue, memoryObjects[0], luminance, 0, NULL, NULL)))  
     {  
        cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numberOfMemoryObjects);  
        cerr << "Unmapping memory objects failed " << __FILE__ << ":"<< __LINE__ << endl;  
        return 1;  
     }  
   
     bool setKernelArgumentsSuccess = true;  
     /* Setup the kernel arguments. */  
     setKernelArgumentsSuccess &= checkSuccess(clSetKernelArg(kernel, 0, sizeof(cl_mem), &memoryObjects[0]));  
     setKernelArgumentsSuccess &= checkSuccess(clSetKernelArg(kernel, 1, sizeof(cl_int), &width));  
     setKernelArgumentsSuccess &= checkSuccess(clSetKernelArg(kernel, 2, sizeof(cl_mem), &memoryObjects[1]));  
     setKernelArgumentsSuccess &= checkSuccess(clSetKernelArg(kernel, 3, sizeof(cl_mem), &memoryObjects[2]));  
     if (!setKernelArgumentsSuccess)  
     {  
         cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numberOfMemoryObjects);  
         cerr << "Failed setting OpenCL kernel arguments. " << __FILE__ << ":"<< __LINE__ << endl;  
         return 1;  
     }  
   
     /* An event to associate with the Kernel. Allows us to retreive profiling information later. */  
     cl_event event = 0;  
   
     /* [Kernel size] */  
     /* 
      * Each instance of the kernel operates on a 16 * 1 portion of the image. 
      * Therefore, the global work size must be width / 16 by height / 1 work items. 
      */  
     size_t globalWorksize[2] = {width / 16, height / 1};  
     /* [Kernel size] */  
   
     /* Enqueue the kernel */  
     if (!checkSuccess(clEnqueueNDRangeKernel(commandQueue, kernel, 2, NULL, globalWorksize, NULL, 0, NULL, &event)))  
     {  
         cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numberOfMemoryObjects);  
         cerr << "Failed enqueuing the kernel. " << __FILE__ << ":"<< __LINE__ << endl;  
         return 1;  
     }  
   
     /* Wait for completion */  
     if (!checkSuccess(clFinish(commandQueue)))  
     {  
         cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numberOfMemoryObjects);  
         cerr << "Failed waiting for kernel execution to finish. " << __FILE__ << ":"<< __LINE__ << endl;  
         return 1;  
     }  
   
     /* Print the profiling information for the event. */  
     printProfilingInfo(event);  
     /* Release the event object. */  
     if (!checkSuccess(clReleaseEvent(event)))  
     {  
         cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numberOfMemoryObjects);  
         cerr << "Failed releasing the event object. " << __FILE__ << ":"<< __LINE__ << endl;  
         return 1;  
     }  
   
     /* Map the arrays holding the output gradients. */  
     bool mapMemoryObjectsSuccess = true;  
     cl_char* outputDx = (cl_char*)clEnqueueMapBuffer(commandQueue, memoryObjects[1], CL_TRUE, CL_MAP_READ, 0, bufferSize, 0, NULL, NULL, &errorNumber);  
     mapMemoryObjectsSuccess &= checkSuccess(errorNumber);  
     cl_char* outputDy = (cl_char*)clEnqueueMapBuffer(commandQueue, memoryObjects[2], CL_TRUE, CL_MAP_READ, 0, bufferSize, 0, NULL, NULL, &errorNumber);  
     mapMemoryObjectsSuccess &= checkSuccess(errorNumber);  
     if (!mapMemoryObjectsSuccess)  
     {  
        cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numberOfMemoryObjects);  
        cerr << "Mapping memory objects failed " << __FILE__ << ":"<< __LINE__ << endl;  
        return 1;  
     }  
   
     /* 
      * In order to better visualise the data, we take the absolute 
      * value of the gradients and then combine them together. 
      * This is work which could be done in the OpenCL kernel, it all depends 
      * on what type of output you require. 
      */  
   
     /* To visualise the data we take the absolute values of the gradients. */  
     unsigned char *absDX = new unsigned char[width * height];  
     unsigned char *absDY = new unsigned char[width * height];  
     for (int i = 0; i < width * height; i++)  
     {  
         absDX[i] = abs(outputDx[i]);  
         absDY[i] = abs(outputDy[i]);  
     }  
   
     /* Unmap the memory. */  
     bool unmapMemoryObjectsSuccess = true;  
     unmapMemoryObjectsSuccess &= checkSuccess(clEnqueueUnmapMemObject(commandQueue, memoryObjects[1], outputDx, 0, NULL, NULL));  
     unmapMemoryObjectsSuccess &= checkSuccess(clEnqueueUnmapMemObject(commandQueue, memoryObjects[2], outputDy, 0, NULL, NULL));  
     if (!unmapMemoryObjectsSuccess)  
     {  
        cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numberOfMemoryObjects);  
        cerr << "Unmapping memory objects failed " << __FILE__ << ":"<< __LINE__ << endl;  
        return 1;  
     }  
   
     /* Release OpenCL objects. */  
     cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numberOfMemoryObjects);  
   
     /* Convert the two output luminance arrays to RGB and save them out to files. */  
     unsigned char* rgbOut = new unsigned char[width * height * 3];  
     luminanceToRGB(absDX, rgbOut, width, height);  
     saveToBitmap("output-dX.bmp", width, height, rgbOut);  
   
     luminanceToRGB(absDY, rgbOut, width, height);  
     saveToBitmap("output-dY.bmp", width, height, rgbOut);  
   
     /* Calculate the total gradient of the image, convert it to RGB and store it out to a file. */  
     unsigned char* totalOutput = new unsigned char[width * height];  
     for (int index = 0; index < width * height; index++)  
     {  
         totalOutput[index] = sqrt(pow(absDX[index], 2) + pow(absDY[index], 2));  
     }  
     luminanceToRGB(totalOutput, rgbOut, width, height);  
     saveToBitmap("output.bmp", width, height, rgbOut);  
   
     delete [] absDX;  
     delete [] absDY;  
     delete [] rgbOut;  
     delete [] totalOutput;  
   
     return 0;  
 }  

SIGES

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Mali OpenCL SDK v1.1.0 教程样例之六“索贝尔滤波器”

Sobel图像滤波器是一种简单的卷积滤波器，主要用于边缘检测算法。样例结果输入图像输出图像算法　　一种做图像边缘检测的技术是，找出图像的梯度。大梯度值的区域对应图像色彩或密度的剧变区域。典型地，这些区域是边缘。　　如果你对于幅图像卷积两个Sobel算子，你会得到两个输出：
复制链接

扫一扫