opencv编译支持gpu版本

昵称是啥可以喝么？

已于 2023-07-21 17:30:04 修改

阅读量1.6k

点赞数 20

文章标签： opencv python 计算机视觉 pytorch

于 2023-07-17 14:42:07 首次发布

本文链接：https://blog.csdn.net/weixin_47721347/article/details/131697103

版权

1. 编译文件设置：

opencv编译gpu版本，支持c++和python调用（踩坑一个月，最终成功）_opencvgpu版本_小树苗m的博客-CSDN博客一、安装编译版本cudnn非运行版本 cp cuda/include/cudnn.h /usr/local/cuda/include/ cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/ chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*二、安装minicondaa、下载miniconda后，依次执行下面的命令： bash minicond_opencvgpu版本https://blog.csdn.net/qq_15060477/article/details/123239963?spm=1001.2101.3001.6661.1&utm_medium=distribute.pc_relevant_t0.none-task-blog-2~default~CTRLIST~Rate-1-123239963-blog-121510097.235%5Ev38%5Epc_relevant_sort_base1&depth_1-utm_source=distribute.pc_relevant_t0.none-task-blog-2~default~CTRLIST~Rate-1-123239963-blog-121510097.235%5Ev38%5Epc_relevant_sort_base1&utm_relevant_index=1&ydreferer=aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0xKWF9haHV0L2FydGljbGUvZGV0YWlscy8xMjE1MTAwOTc%3D&ydreferer=aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0xKWF9haHV0L2FydGljbGUvZGV0YWlscy8xMjE1MTAwOTc%3D

2. 解决 CMake Error:

CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files错误：

编译opencv时遇到问题：CUDA_CUDA_LIBRARY_cuda_cudart_library_lh_lyh的博客-CSDN博客

将找不到的库，写入编译文件

3. 执行的sudo make 遇到问题

nvcc fatal   : Unsupported gpu architecture 'compute_30' CMake Error at cuda_compile_1_generated_gpu_mat.cu.o.RELEASE.cmake:222 (message):

解决方法（1）：修改opencv源码文件cmake目录下的OpenCVDetectCUDA.cmake 文件89行：即删除3.x，2.x的相关内容。

解决方法（2）：例外发现最新opencv4.8版本的源码编译没有出现这个问题。

  set(__cuda_arch_ptx "")
  if(CUDA_GENERATION STREQUAL "Maxwell")
    set(__cuda_arch_bin "5.0 5.2")
  elseif(CUDA_GENERATION STREQUAL "Pascal")
    set(__cuda_arch_bin "6.0 6.1")
  elseif(CUDA_GENERATION STREQUAL "Volta")
    set(__cuda_arch_bin "7.0")
  elseif(CUDA_GENERATION STREQUAL "Turing")
    set(__cuda_arch_bin "7.5")
  elseif(CUDA_GENERATION STREQUAL "Auto")

136行附近，因为我使用的cuda是11.6，删除3.x相关内容

    else()
      if(CUDA_VERSION VERSION_LESS "9.0")
        set(__cuda_arch_bin "2.0 3.0 3.5 3.7 5.0 5.2 6.0 6.1")
      elseif(CUDA_VERSION VERSION_LESS "10.0")
        set(__cuda_arch_bin "3.0 3.5 3.7 5.0 5.2 6.0 6.1 7.0")
      else()
        set(__cuda_arch_bin "5.0 5.2 6.0 6.1 7.0 7.5")

4. 执行的sudo make 遇到问题

/usr/bin/ld: cannot find -ltrue
collect2: error: ld returned 1 exit status

解决方式：Build opencv with cuda error : /usr/bin/ld: cannot find -ltrue · Issue #19794 · opencv/opencv · GitHub 简单概括来说就是执行cmake命令时，存在-DCUDA_nppicom_LIBRARY=true这个选项会导致错误，博客推荐的方法是改为-DCUDA_nppicom_LIBRARY=false，我的做法是删掉这条命令。另外这个删掉以后可能会导致问题2的出现；但是我使用opencv4.8以后再没有出现问题2。我编译环境为：python3.8；opencv4.8；cuda11.6，cudatoolkit-510.39,gcc-9.4.0.

5.使用opencv调用GPU读取rstp流报错：

cv2.error: OpenCV(4.8.0) /xxx/opencv-4.8.0/modules/core/include/opencv2/core/private.cuda.hpp:112: error: (-213:The function/feature is not implemented) The called functionality is disabled for current build or platform in function 'throw_no_cuda'

解决方法：重新编译cmake命令中添加-D WITH_NVCUVID=ON，参考【AVD】Linux 编译支持 Cuda 的 OpenCV 4.6，解决报错 throw_no_cuda_the called functionality is disabled for current b_深海Enoch的博客-CSDN博客

和opencv - Reading a video on GPU using C++ and CUDA - Stack Overflow

即从NVIDIA官网下载NVIDIA VIDEO CODEC SDK，解压后将Interface目录下的三个文件copy到cuda所在的interface目录下，实例如：/usr/local/cuda-11.6/targets/x86_64-linux/include/，然后重新编译。

6. 将cv2.cuda.GpuMat转换为pytorch的tensor张量方法：

参考：https://www.simonwenkel.com/notes/software_libraries/opencv/opencv-cuda-integration.html#opencv-cuda-intro

与：

Allow access to CUDA pointers for interoperability with other libraries by pwuertz · Pull Request #16513 · opencv/opencv · GitHubThis is a proposal for adding CV_WRAP compatible cudaPtr() getter methods to GpuMat and Stream, required for enabling interoperability between OpenCV and other CUDA supporting python libraries like Numba, CuPy, PyTorch, etc.Here is an example for sharing a GpuMat with CuPy:import cv2 as cvimport cupy as cp# Create GPU array with OpenCVdata_gpu_cv = cv.cuda_GpuMat()data_gpu_cv.upload(np.eye(64, dtype=np.float32))# Modify the same GPU array with CuPydata_gpu_cp = cp.asarray(CudaArrayInterface(data_gpu_cv))data_gpu_cp *= 42.0# Download and verifyassert np.allclose(data_gpu_cp.get(), np.eye(64) * 42.0)In this example CudaArrayInterface is a (incomplete) adapter class that implements the cuda array interface used by other frameworks:class CudaArrayInterface: def __init__(self, gpu_mat): w, h = gpu_mat.size() type_map = { cv.CV_8U: "u1", cv.CV_8S: "i1", cv.CV_16U: "u2", cv.CV_16S: "i2", cv.CV_32S: "i4", cv.CV_32F: "f4", cv.CV_64F: "f8", } self.__cuda_array_interface__ = { "version": 2, "shape": (h, w), "data": (gpu_mat.cudaPtr(), False), "typestr": type_map[gpu_mat.type()], "strides": (gpu_mat.step, gpu_mat.elemSize()), }If possible, I'd like to implement __cuda_array_interface__ within the GpuMat python binding in a future PR (not sure how to define a python property using the wrapper generator though).force_builders=Custombuildworker:Custom=linux-4build_image:Custom=ubuntu-cuda:18.04https://github.com/opencv/opencv/pull/16513#issue-371438498概括来说添加一个GpuMat与ndarry的接口：前两篇博客使用的versioon：2，最新的是3，还是有点区别的。最新的__array_interface__官方文档地址：The array interface protocol — NumPy v1.24 Manual

class CudaArrayInterface:
    def __init__(self, gpu_mat):
        w, h = gpu_mat.size()
        type_map = {
            #cv2.CV_8U: "u1", cv2.CV_8S: "i1",
            #cv2.CV_16U: "u2", cv2.CV_16S: "i2",
            #cv2.CV_32S: "i4",
            #cv2.CV_32F: "f4", cv2.CV_64F: "f8",
            16:"|u1",#测试过后发现opencv4.8版本，GpuMat.type输出的是整数
        }
        self.__cuda_array_interface__ = {
            "version": 3,
            "shape": (h, w,c),
            "data": (gpu_mat.cudaPtr(), False),
            "typestr": type_map[gpu_mat.type()],
            "descr":[('b','|u1'), ('g','|u1'), ('r','|u1')],
            "strides": (gpu_mat.step, gpu_mat.elemSize(),1),#这里strides参考https://numpy.org/doc/1.24/reference/generated/numpy.ndarray.strides.html。
        }

然后使用torch的as_tensor()或者tensor()函数将GpuMat转为tensor，注意要将device设为cuda。

import torch
def main(file_path):
    cap = cv2.cudacodec.createVideoReader(file_path)
    num=0
    while True:
        ret, frame = cap.nextFrame()
        if ret is False:
            break
        # GpyMat是四通道，包含一个透明度通道
        # 转换为三通道格式
        frame=cv2.cuda.cvtColor(frame,cv2.COLOR_BGRA2BGR)
        tensor = torch.tensor(CudaArrayInterface(frame),device=torch.device('cuda'))
        print('tensor:',tensor.shape)
    cap.release()

“strides”的长度要和shape保持相等，结合numpy的说明文档，个人推断“strides”的第三个参数比较合适设置为ndarry.itemsize，这个ndarry我使用的是将gpumat转为在cpu中的图片。这涉及了ndarry.strides 属性的描述，比较难理解。（numpy.ndarray.strides — NumPy v1.24 Manual）

7.使用multiprocessing启动多进程调用cuda的功能报错

报错：

cv2.error: OpenCV(4.8.0) /xx/opencv-4.8.0/modules/core/src/cuda/gpu_mat.cu:121: error: (-217:Gpu API call) initialization error in function 'allocate'

具体原先可能是如果先初始化cuda后启动多进程会发生错误，需要在启动多进程前添加

multiprocessing.set_start_method('spawn')
#文心一言的解释：multiprocessing.set_start_method('spawn') 是一个用于设置多进程启动方法的函数。在 Python 中，多进程可以通过不同的启动方法来启动新的进程，其中 'spawn' 是一种常用的启动方法。

#当调用 multiprocessing.set_start_method('spawn') 时，Python 将使用 'spawn' 启动方法来创建新的进程。这种启动方法可以在不同的操作系统上使用，并且可以避免一些与 fork() 系统调用相关的问题。

#在某些情况下，特别是在使用 GUI 应用程序时，使用 'spawn' 启动方法可以避免一些多进程启动时的常见问题。例如，如果您尝试在 Windows 上使用 'fork' 启动方法启动新的进程，可能会出现一些与 GUI 相关的错误，因为 'fork' 会复制父进程的内存空间，包括 GUI 相关的资源。使用 'spawn' 启动方法可以避免这些问题。

#需要注意的是，不同的启动方法可能存在一些性能差异，因此您需要根据您的具体需求选择最适合您的启动方法。

本人测试可以成功运行的方法：

if __name__=='__main__':    
    multiprocessing.set_start_method('spawn')
    file_path = 'xxx'
    p1=Process(target=main,args=(file_path,))
    p2=Process(target=main,args=(file_path,))
    p1.start()
    p2.start()
    p1.join()
    p2.join()

参考：https://github.com/explosion/spaCy/issues/5507；python 3.x - Cupy get error in multithread.pool if GPU already used - Stack Overflow