reference
demo code
1 RK 3588 mali GPU
RK3588集成了嵌入式ARM Mali G610 3D GPU,支持OpenGLES 1.1、2.0、3.2,OpenCL 2.2和Vulkan1.2。带有MMU的特殊2D硬件引擎将最大限度地提高显示性能,并提供非常平稳的操作。
2 API support
OpenGL® ES 1.1, 2.0, 3.1, 3.2
Vulkan 1.1, 1.2
OpenCL™ 1.1, 1.2, 2.0 Full profile
Full support for next-generation and legacy 2D/3D graphics applications
3 demo result
General configuration for OpenCV 3.4.5 =====================================
Version control: unknown
Platform:
Timestamp: 2023-05-10T09:40:39Z
Host: Linux 5.10.110 aarch64
CMake: 3.16.3
CMake generator: Unix Makefiles
CMake build tool: /usr/bin/make
Configuration: RELEASE
CPU/HW features:
Baseline: NEON FP16
required: NEON
disabled: VFPV3
C/C++:
Built as dynamic libs?: YES
C++11: YES
C++ Compiler: /usr/bin/c++ (ver 9.4.0)
C++ flags (Release): -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Winit-self -Wno-narrowing -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffast-math -ffunction-sections -fdata-sections -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG -DNDEBUG
C++ flags (Debug): -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Winit-self -Wno-narrowing -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffast-math -ffunction-sections -fdata-sections -fvisibility=hidden -fvisibility-inlines-hidden -g -O0 -DDEBUG -D_DEBUG
C Compiler: /usr/bin/cc
C flags (Release): -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Winit-self -Wno-narrowing -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffast-math -ffunction-sections -fdata-sections -fvisibility=hidden -O3 -DNDEBUG -DNDEBUG
C flags (Debug): -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Winit-self -Wno-narrowing -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffast-math -ffunction-sections -fdata-sections -fvisibility=hidden -g -O0 -DDEBUG -D_DEBUG
Linker flags (Release):
Linker flags (Debug):
ccache: NO
Precompiled headers: YES
Extra dependencies: dl m pthread rt
3rdparty dependencies:
OpenCV modules:
To be built: calib3d core dnn features2d flann highgui imgcodecs imgproc java_bindings_generator ml objdetect photo python_bindings_generator shape stitching superres ts video videoio videostab
Disabled: world
Disabled by dependency: -
Unavailable: cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev java js python2 python3 viz
Applications: tests perf_tests apps
Documentation: NO
Non-free algorithms: NO
GUI:
GTK+: YES (ver 3.24.20)
GThread : YES (ver 2.64.6)
GtkGlExt: NO
VTK support: NO
Media I/O:
ZLib: /usr/lib/aarch64-linux-gnu/libz.so (ver 1.2.11)
JPEG: /usr/lib/aarch64-linux-gnu/libjpeg.so (ver 80)
WEBP: build (ver encoder: 0x020e)
PNG: /usr/lib/aarch64-linux-gnu/libpng.so (ver 1.6.37)
TIFF: /usr/lib/aarch64-linux-gnu/libtiff.so (ver 42 / 4.1.0)
JPEG 2000: build (ver 1.900.1)
OpenEXR: build (ver 1.7.1)
HDR: YES
SUNRASTER: YES
PXM: YES
Video I/O:
DC1394: YES (ver 2.2.5)
FFMPEG: YES
avcodec: YES (ver 58.54.100)
avformat: YES (ver 58.29.100)
avutil: YES (ver 56.31.100)
swscale: YES (ver 5.5.100)
avresample: NO
GStreamer:
base: YES (ver 1.16.2)
video: YES (ver 1.16.2)
app: YES (ver 1.16.2)
riff: YES (ver 1.16.2)
pbutils: YES (ver 1.16.2)
libv4l/libv4l2: NO
v4l/v4l2: linux/videodev2.h
Parallel framework: pthreads
Trace: YES (built-in)
Other third-party libraries:
Lapack: NO
Eigen: YES (ver 3.3.7)
Custom HAL: YES (carotene (ver 0.0.1))
Protobuf: build (3.5.1)
OpenCL: YES (no extra features)
Include path: /home/firefly/sofeware/opencv/opencv-3.4.5/3rdparty/include/opencl/1.2
Link libraries: Dynamic load
Python (for build): /usr/bin/python2.7
Java:
ant: NO
JNI: NO
Java wrappers: NO
Java tests: NO
Install to: /usr/local
-----------------------------------------------------------------
getNumThreads=8
getNumberOfCPUs=8
useOptimized=1
CV_CPU_SSE4_1=0
CV_CPU_SSE4_2=0
CV_CPU_AVX2=0
HardwareSupport:
CV_CPU_MMX: 0
CV_CPU_SSE: 0
CV_CPU_SSE2: 0
CV_CPU_SSE3: 0
CV_CPU_SSSE3: 0
CV_CPU_SSE4_1: 0
CV_CPU_SSE4_2: 0
CV_CPU_POPCNT: 0
CV_CPU_FP16: 1
CV_CPU_AVX: 0
CV_CPU_AVX2: 0
CV_CPU_FMA3: 0
CV_CPU_AVX_512F: 0
CV_CPU_AVX_512BW: 0
CV_CPU_AVX_512CD: 0
CV_CPU_AVX_512DQ: 0
CV_CPU_AVX_512ER: 0
CV_CPU_AVX_512IFMA512: 0
CV_CPU_AVX_512IFMA: 0
CV_CPU_AVX_512PF: 0
CV_CPU_AVX_512VBMI: 0
CV_CPU_AVX_512VL: 0
CV_CPU_NEON: 1
CV_CPU_VSX: 0
CV_CPU_AVX512_SKX: 0
CV_HARDWARE_MAX_FEATURE: 0
***********SDK************
Name:ARM Platform
Vendor:ARM
Version:OpenCL 2.1 v1.g6p0-01eac0.2819f9d4dbe0b5a2f89c835d8484f9cd
Version:OpenCL 2.1 v1.g6p0-01eac0.2819f9d4dbe0b5a2f89c835d8484f9cd
Number of devices:1
arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '7'.
***********Device 1***********
code
#include <iostream>
#include <opencv2/core.hpp>
#include <opencv2/core/ocl.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/imgproc.hpp>
#define IMAGE_PATHNAME "D:\\allike\\Image_20210707220543807.jpg"
void checkBuild()
{
//查询opencv编译时配置
std::cout << cv::getBuildInformation() << std::endl;
}
void checkSimd()
{
//查询opencv线程
int numTh = cv::getNumThreads(); //默认值是cpu的逻辑线程数
int numCore = cv::getNumberOfCPUs();
std::cout << "getNumThreads=" << numTh << std::endl;
std::cout << "getNumberOfCPUs=" << numCore << std::endl;
//查询opencv当前是否开启了并行优化功能
bool opt = cv::useOptimized(); //默认值是true
std::cout << "useOptimized=" << opt << std::endl;
//查询opencv当前是否支持具体的CPU指令集
bool check1 = cv::checkHardwareSupport(CV_CPU_SSE4_1);
bool check2 = cv::checkHardwareSupport(CV_CPU_SSE4_2);
bool check3 = cv::checkHardwareSupport(CV_CPU_AVX2);
std::cout << "CV_CPU_SSE4_1=" << check1 << std::endl;
std::cout << "CV_CPU_SSE4_2=" << check2 << std::endl;
std::cout << "CV_CPU_AVX2=" << check3 << std::endl;
//查询完整的硬件支持清单
std::cout << "HardwareSupport:" << std::endl;
std::cout << "CV_CPU_MMX: " << cv::checkHardwareSupport(CV_CPU_MMX) << std::endl;
std::cout << "CV_CPU_SSE: " << cv::checkHardwareSupport(CV_CPU_SSE) << std::endl;
std::cout << "CV_CPU_SSE2: " << cv::checkHardwareSupport(CV_CPU_SSE2) << std::endl;
std::cout << "CV_CPU_SSE3: " << cv::checkHardwareSupport(CV_CPU_SSE3) << std::endl;
std::cout << "CV_CPU_SSSE3: " << cv::checkHardwareSupport(CV_CPU_SSSE3) << std::endl;
std::cout << "CV_CPU_SSE4_1: " << cv::checkHardwareSupport(CV_CPU_SSE4_1) << std::endl;
std::cout << "CV_CPU_SSE4_2: " << cv::checkHardwareSupport(CV_CPU_SSE4_2) << std::endl;
std::cout << "CV_CPU_POPCNT: " << cv::checkHardwareSupport(CV_CPU_POPCNT) << std::endl;
std::cout << "CV_CPU_FP16: " << cv::checkHardwareSupport(CV_CPU_FP16) << std::endl;
std::cout << "CV_CPU_AVX: " << cv::checkHardwareSupport(CV_CPU_AVX) << std::endl;
std::cout << "CV_CPU_AVX2: " << cv::checkHardwareSupport(CV_CPU_AVX2) << std::endl;
std::cout << "CV_CPU_FMA3: " << cv::checkHardwareSupport(CV_CPU_FMA3) << std::endl;
std::cout << "CV_CPU_AVX_512F: " << cv::checkHardwareSupport(CV_CPU_AVX_512F) << std::endl;
std::cout << "CV_CPU_AVX_512BW: " << cv::checkHardwareSupport(CV_CPU_AVX_512BW) << std::endl;
std::cout << "CV_CPU_AVX_512CD: " << cv::checkHardwareSupport(CV_CPU_AVX_512CD) << std::endl;
std::cout << "CV_CPU_AVX_512DQ: " << cv::checkHardwareSupport(CV_CPU_AVX_512DQ) << std::endl;
std::cout << "CV_CPU_AVX_512ER: " << cv::checkHardwareSupport(CV_CPU_AVX_512ER) << std::endl;
std::cout << "CV_CPU_AVX_512IFMA512: " << cv::checkHardwareSupport(CV_CPU_AVX_512IFMA512) << std::endl;
std::cout << "CV_CPU_AVX_512IFMA: " << cv::checkHardwareSupport(CV_CPU_AVX_512IFMA) << std::endl;
std::cout << "CV_CPU_AVX_512PF: " << cv::checkHardwareSupport(CV_CPU_AVX_512PF) << std::endl;
std::cout << "CV_CPU_AVX_512VBMI: " << cv::checkHardwareSupport(CV_CPU_AVX_512VBMI) << std::endl;
std::cout << "CV_CPU_AVX_512VL: " << cv::checkHardwareSupport(CV_CPU_AVX_512VL) << std::endl;
std::cout << "CV_CPU_NEON: " << cv::checkHardwareSupport(CV_CPU_NEON) << std::endl;
std::cout << "CV_CPU_VSX: " << cv::checkHardwareSupport(CV_CPU_VSX) << std::endl;
std::cout << "CV_CPU_AVX512_SKX: " << cv::checkHardwareSupport(CV_CPU_AVX512_SKX) << std::endl;
std::cout << "CV_HARDWARE_MAX_FEATURE: " << cv::checkHardwareSupport(CV_HARDWARE_MAX_FEATURE) << std::endl;
std::cout << std::endl;
//cv::setUseOptimized(false);
//cv::setNumThreads(1);
}
void checkOpenCL() //Open Computing Language:开放计算语言,可以附加在主机处理器的CPU或GPU上执行
{
std::vector<cv::ocl::PlatformInfo> info;
getPlatfomsInfo(info);
cv::ocl::PlatformInfo sdk = info.at(0);
int number = sdk.deviceNumber();
if (number < 1)
{
std::cout << "Number of devices:" << number << std::endl;
return;
}
std::cout << "***********SDK************" << std::endl;
std::cout << "Name:" << sdk.name() << std::endl;
std::cout << "Vendor:" << sdk.vendor() << std::endl;
std::cout << "Version:" << sdk.version() << std::endl;
std::cout << "Version:" << sdk.version() << std::endl;
std::cout << "Number of devices:" << number << std::endl;
for (int i = 0; i < number; i++)
{
std::cout << std::endl;
cv::ocl::Device device;
sdk.getDevice(device, i);
std::cout << "***********Device " << i + 1 << "***********" << std::endl;
std::cout << "Vendor Id:" << device.vendorID() << std::endl;
std::cout << "Vendor name:" << device.vendorName() << std::endl;
std::cout << "Name:" << device.name() << std::endl;
std::cout << "Driver version:" << device.vendorID() << std::endl;
if (device.isAMD())
std::cout << "Is AMD device" << std::endl;
if (device.isIntel())
std::cout << "Is Intel device" << std::endl;
if (device.isNVidia())
std::cout << "Is NVidia device" << std::endl;
std::cout << "Global Memory size:" << device.globalMemSize() << std::endl;
std::cout << "Memory cache size:" << device.globalMemCacheSize() << std::endl;
std::cout << "Memory cache type:" << device.globalMemCacheType() << std::endl;
std::cout << "Local Memory size:" << device.localMemSize() << std::endl;
std::cout << "Local Memory type:" << device.localMemType() << std::endl;
std::cout << "Max Clock frequency:" << device.maxClockFrequency() << std::endl;
}
}
void calcEdgesCPU()
{
cv::ocl::setUseOpenCL(false);
bool ret1 = cv::ocl::haveOpenCL();
bool ret2 = cv::ocl::useOpenCL();
std::cout << "haveOpenCL:" << ret1 << std::endl;
std::cout << "useOpenCL:" << ret2 << std::endl;
double start = cv::getTickCount();
cv::Mat cpuGray, cpuBlur, cpuEdges;
cv::Mat cpuFrame = cv::imread(IMAGE_PATHNAME);
cvtColor(cpuFrame, cpuGray, cv::COLOR_BGR2GRAY);
cv::GaussianBlur(cpuGray, cpuBlur, cv::Size(3, 3), 15, 15);
cv::Canny(cpuBlur, cpuEdges, 50, 100, 3);
std::vector<cv::Vec3f> cir;
cv::HoughCircles(cpuBlur, cir, cv::HOUGH_GRADIENT_ALT, 1.5, 15, 300, 0.8, 1, 100);
std::cout << "CPU cost time:(s)" << ((cv::getTickCount() - start) / cv::getTickFrequency()) << std::endl;
cv::namedWindow("Canny Edges CPU", cv::WINDOW_NORMAL);
cv::imshow("Canny Edges CPU", cpuEdges);
}
void calcEdgesGPU()
{
cv::ocl::setUseOpenCL(true);
bool ret1 = cv::ocl::haveOpenCL();
bool ret2 = cv::ocl::useOpenCL();
std::cout << "haveOpenCL:" << ret1 << std::endl;
std::cout << "useOpenCL:" << ret2 << std::endl;
//通过使用UMat对象,OpenCV会自动在支持OpenCL的设备上使用GPU运算,在不支持OpenCL的设备仍然使用CPU运算,这样就避免了程序运行失败,而且统一了接口。
double start = cv::getTickCount();
cv::UMat gpuFrame, gpuGray, gpuBlur, gpuEdges;
cv::Mat cpuFrame = cv::imread(IMAGE_PATHNAME);
cpuFrame.copyTo(gpuFrame); //Mat与UMat相互转换
cvtColor(gpuFrame, gpuGray, cv::COLOR_BGR2GRAY);
cv::GaussianBlur(gpuGray, gpuBlur, cv::Size(3, 3), 15, 15);
cv::Canny(gpuBlur, gpuEdges, 50, 100, 3);
std::vector<cv::Vec3f> cir;
cv::HoughCircles(gpuBlur, cir, cv::HOUGH_GRADIENT_ALT, 1.5, 15, 300, 0.8, 1, 100);
std::cout << "GPU cost time:(s)" << ((cv::getTickCount() - start) / cv::getTickFrequency()) << std::endl;
cv::Mat matResult = gpuEdges.getMat(cv::ACCESS_READ); //Mat与UMat相互转换
cv::namedWindow("Canny Edges GPU1", cv::WINDOW_NORMAL);
cv::imshow("Canny Edges GPU1", matResult);
cv::namedWindow("Canny Edges GPU2", cv::WINDOW_NORMAL);
cv::imshow("Canny Edges GPU2", gpuEdges);
}
int main(int argc, char *argv[])
{
checkBuild();
checkSimd();
checkOpenCL();
calcEdgesCPU();
calcEdgesGPU();
cv::waitKey(0);
return 0;
}