opencv、ffmpeg使用nvidia-video-codec-sdk编解码

原创已于 2023-10-11 17:17:00 修改 · 6k 阅读

27 ·

CC 4.0 BY-SA版权

文章标签：

#opencv #ffmpeg #视频编解码 #cuda #cudacodec

于 2023-07-01 11:59:04 首次发布

OpenCV 专栏收录该内容

47 篇文章

订阅专栏

文章详细介绍了如何利用OpenCV结合CUDA以及ffmpeg库实现视频的硬件加速编解码。首先，确认ffmpeg系统库是否支持CUDA编解码，接着讨论如何通过源码编译使opencv支持硬件加速。然后，提供了测试解码和编码的示例代码，展示了在不同分辨率下，GPU解码相对于CPU的性能优势。最后，提到了cv::cudacodec::VideoReader和cv::cudacodec::VideoWriter的使用方法，包括自定义数据回调处理和编码参数配置。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

opencv很早就支持cuda加速，但是一般用于图像处理模块。

在视频读（包含实时视频流）写上，opencv可以使用ffmpeg作为后端进行编解码，通常是cpu软编解。如果ffmpeg的编译支持gpu硬编解，那么opencv的接口就直接支持硬件编解码了。

文章目录

1、ffmpeg avcodec库是否支持cuda编解码
- 1.1、系统库直接支持
- 1.2、系统库不支持
2、直接opencv编译源码支持硬件加速
3、`cv::cudacodec::VideoReader`、`cv::cudacodec::VideoWriter`使用

1、ffmpeg avcodec库是否支持cuda编解码

1.1、系统库直接支持

如果不想安装一堆依赖软件，可以直接下载 static 版本下载链接 ffmpeg。

linux下使用ffmpeg库，可能直接使用系统直接安装的libavcodec库（ubuntu下使用 apt install livabcodec-dev），可以直接使用 ffmpeg 工具查看）

执行 ffmpeg -codes | grep cuvid，输出有

DEV.LS h264                  H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 (decoders: h264 h264_v4l2m2m h264_qsv h264_cuvid ) (encoders: libx264 libx264rgb h264_nvenc h264_omx h264_qsv h264_v4l2m2m h264_vaapi nvenc nvenc_h264 )
 DEV.L. hevc                 H.265 / HEVC (High Efficiency Video Coding) (decoders: hevc hevc_qsv hevc_v4l2m2m hevc_cuvid ) (encoders: libx265 nvenc_hevc hevc_nvenc hevc_qsv hevc_v4l2m2m hevc_vaapi )
 DEVIL. mjpeg                Motion JPEG (decoders: mjpeg mjpeg_cuvid mjpeg_qsv ) (encoders: mjpeg mjpeg_qsv mjpeg_vaapi )
 DEV.L. mpeg1video           MPEG-1 video (decoders: mpeg1video mpeg1_v4l2m2m mpeg1_cuvid )
 DEV.L. mpeg2video           MPEG-2 video (decoders: mpeg2video mpegvideo mpeg2_v4l2m2m mpeg2_qsv mpeg2_cuvid ) (encoders: mpeg2video mpeg2_qsv mpeg2_vaapi )
 DEV.L. mpeg4                MPEG-4 part 2 (decoders: mpeg4 mpeg4_v4l2m2m mpeg4_cuvid ) (encoders: mpeg4 libxvid mpeg4_omx mpeg4_v4l2m2m )
 D.V.L. vc1                  SMPTE VC-1 (decoders: vc1 vc1_qsv vc1_v4l2m2m vc1_cuvid )
 DEV.L. vp8                  On2 VP8 (decoders: vp8 vp8_v4l2m2m libvpx vp8_cuvid vp8_qsv ) (encoders: libvpx vp8_v4l2m2m vp8_vaapi )
 DEV.L. vp9                  Google VP9 (decoders: vp9 vp9_v4l2m2m libvpx-vp9 vp9_cuvid vp9_qsv ) (encoders: libvpx-vp9 vp9_vaapi vp9_qsv )

可以看到decoder支持解码的编码格式很多；同时，encoder下有nvenc同样支持很多编码格式。

此时linux下系统自带的avcodec是支持cuda编解码加速的，使用是可以直接指定使用硬件加速了，例如

// AVCodec* codec = avcodec_find_encoder(AV_CODEC_ID_H264);
AVCodec* codec = avcodec_find_encoder_by_name("h264_nvenc");

windows下的支持继续看下一节。

1.2、系统库不支持

当执行 ffmpeg -codes | grep cuvid，说明系统自带的ffmpeg库不支持硬件加速。就需要自己从源码编译了。

windows下是是否支持硬件加速， win和linux使用源码编译支持硬件加速参看博客【ffmpeg学习源代码编译、英伟达硬件加速】，也可参考官网说明【Using FFmpeg with NVIDIA GPU Hardware Acceleration】。

2、直接opencv编译源码支持硬件加速

这里主要针对opencv在cpu版本下 cv::VideoCapture、cv::VideoWriter，使用cuda加速的版本cv::cudacodec::VideoReader、cv::cudacodec::VideoWriter。

2.1、编译支持

opencv编译默认开启了 WITH_NVCUVID，但是编译还需要依赖nvidia针对视频编解码提供的 NVIDIA VIDEO CODEC SDK 库，该包含两个硬件加速接口：

用于视频编码加速的 NVENCODE API
用于视频解码加速的 NVDECODE API（旧称 NVCUVID API）

对不同视频编码格式的加速支持也能在该网页上查看。最新版为Video_Codec_SDK_12.0.16，基本环境要求

Windows: Driver version 522.25 or higher
Linux: Driver version 520.56.06 or higher
CUDA 11.0 or higher Toolkit

系统默认安装的ffmpeg4.4、以及cuda强制使用 Video Codec SDK 8.1及以上版本，因此如果使用新版本cuad或者系统直接安装ffmpeg4.4以上，是可以直接使用该加速库的。

linux编译opencv要想支持使用cv::cudacodec::VideoReader、cv::cudacodec::VideoWriter，必须先下载 Video Codec SDK，之后将解压后的头文件目\Interface的内内容全部复制到cuda的include目录。

之后opencv编译时，必须出现如下 NVCUVID 内容：

  NVIDIA CUDA:                   YES (ver 11.2, CUFFT CUBLAS NVCUVID FAST_MATH)
    NVIDIA GPU arch:             52 61 70 75 86
    NVIDIA PTX archs:

  cuDNN:                         YES (ver 8.1.1)

2.2、测试（解码）

这里以读取视频即使用解码功能为例，直接给出源代码，

#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <numeric>
#include "opencv2/opencv_modules.hpp"
#include <opencv2/core/utility.hpp>
#include <opencv2/core.hpp>
#include <opencv2/core/opengl.hpp>
#include <opencv2/cudacodec.hpp>
#include <opencv2/highgui.hpp>

int main(int argc, const char* argv[])
{
   std::cout<<cv::getBuildInformation()<<std::endl;
   //将这个流改成你自己的
   const std::string fname = "rtmp://192.168.3.100:1935/live/xcp1";
    /// 没有使用opengl编译需要关闭
    //cv::cuda::setGlDevice();
    //cv::cuda::setGlDevice(1);
    
    cv::Mat frame;
    cv::VideoCapture reader(fname);
    cv::cuda::GpuMat d_frame;
    cv::Ptr<cv::cudacodec::VideoReader> d_reader = cv::cudacodec::createVideoReader(fname);
    cv::TickMeter tm;
    std::vector<double> cpu_times;
    std::vector<double> gpu_times;

    for (int i = 0;i<100;i++)
    {
        tm.reset(); tm.start();
        if (!reader.read(frame))
            break;
         tm.stop();
         cpu_times.push_back(tm.getTimeMilli());

         tm.reset(); tm.start();
        if (!d_reader->nextFrame(d_frame))
            break;
         tm.stop();
         gpu_times.push_back(tm.getTimeMilli());
    }

    if (!cpu_times.empty() && !gpu_times.empty())
    {
        std::cout << std::endl << "Results:" << std::endl;

        std::sort(cpu_times.begin(), cpu_times.end());
        std::sort(gpu_times.begin(), gpu_times.end());

        double cpu_avg = std::accumulate(cpu_times.begin(), cpu_times.end(), 0.0) / cpu_times.size();
        double gpu_avg = std::accumulate(gpu_times.begin(), gpu_times.end(), 0.0) / gpu_times.size();

        std::cout << "CPU : Avg : " << cpu_avg << " ms FPS : " << 1000.0 / cpu_avg << std::endl;
        std::cout << "GPU : Avg : " << gpu_avg << " ms FPS : " << 1000.0 / gpu_avg << std::endl;
    }

    return 0;
}

cmake文件内容

cmake_minimum_required(VERSION 3.18)
project(test)
set(CMAKE_BUILD_TYPE "Release")

# opencv
set(OpenCV_DIR "/softwares/opencv/opencv-4.6.0/install/lib/cmake/opencv4/")
find_package(OpenCV REQUIRED)
include_directories(${OpenCV_INCLUDE_DIRS})

message(STATUS "Found OpenCV version ${OpenCV_VERSION}")

link_libraries( ${OpenCV_LIBS} )
add_executable( ${PROJECT_NAME} main.cpp)
target_link_libraries(${PROJECT_NAME}  
  ${OpenCV_LIBS}  
)

测试时的软硬件环境如下：cpu 11700， rtx 3080，cuda 11.2，cudnn 8.1.1。

2.2.1、测试1280x720, 3964 kb/s, 29.97 fps

Results:
CPU : Avg : 0.728801 ms FPS : 1372.12
GPU : Avg : 0.506997 ms FPS : 1972.4

2.2.2、测试2048x1080, q=2-31, 2785 kb/s, 25 fps

Results:
CPU : Avg : 2.83558 ms FPS : 352.662
GPU : Avg : 0.0964484 ms FPS : 10368.2

两个测试对比，在720p时，区别不明显。当视频分辨率提高到2k时，cpu解码效率明显下降，并且gpu效率是cpu的30倍，差异明显。

这里比较中两个视频后者相对，分辨率提高，但是码率下降了。 cpu从 1372帧骤降到253帧，很正常；但是 gpu反而从1972提高到了10368，简单看有点反常理，应该同时相对降低才对。可能是由于gpu吞吐提升，提高了使用效率。

另外，实际使用是，gpu方式使用cv::cuda::GpuMat，如果腾挪到cv::Mat上可能还存在一点耗时，需要根据事情情况选择使用。另外，图像处理也cuda加速的，例如模块opencv_cudaimgproc、opencv_cudafeatures2d等。

2.3、测试（编码）

使用 cv::cudacodec::VideoWriter 类对象，仅能输出 264、265 的原始编码数据（不同于cv::VideoWriter 能直接输出视频封装文件）。其创建有2个函数，第二个接口函数能设置编码的参数。

CV_EXPORTS_W Ptr<cudacodec::VideoWriter> createVideoWriter(const String& fileName, const Size frameSize, const Codec codec = Codec::H264, const double fps = 25.0,
    const ColorFormat colorFormat = ColorFormat::BGR, Ptr<EncoderCallback> encoderCallback = 0, const Stream& stream = Stream::Null());

CV_EXPORTS_W Ptr<cudacodec::VideoWriter> createVideoWriter(const String& fileName, const Size frameSize, const Codec codec, const double fps,  const ColorFormat colorFormat,
    const EncoderParams& params, Ptr<EncoderCallback> encoderCallback = 0, const Stream& stream = Stream::Null());

直接给出代码

#include <iostream>

#include "opencv2/opencv_modules.hpp"

#if defined(HAVE_OPENCV_CUDACODEC)

#include <vector>
#include <numeric>

#include "opencv2/core.hpp"
#include "opencv2/cudacodec.hpp"

int main()
{
    std::string filename = R"(E:\DeepLearning\Paddle\PaddleDetection-2.6.0\demo\gl_4k.mp4)";
    cv::VideoCapture reader(filename);

    if(!reader.isOpened()) {
        std::cerr << "Can't open input video file" << std::endl;
        return -1;
    }

    double fps = reader.get(CAP_PROP_FPS);
    int framecount = reader.get(CAP_PROP_FRAME_COUNT);
    cv::Size size(reader.get(CAP_PROP_FRAME_WIDTH), reader.get(CAP_PROP_FRAME_HEIGHT));
    std::cout << "Video info: size = " << size << ", fps = " << fps << ", frames = " << framecount << std::endl;

    cv::cuda::printShortCudaDeviceInfo(cv::cuda::getDevice());

    cv::VideoWriter writer;
    cv::Ptr<cv::cudacodec::VideoWriter> d_writer;

    cv::Mat frame;
    cv::cuda::GpuMat d_frame;
    cv::cuda::Stream stream;

    cv::TickMeter tm;

    ////--------------------   cpu
    String outputFilename = "output_cpu.avi";
    if(!writer.open(outputFilename, cv::VideoWriter::fourcc('X', 'V', 'I', 'D'), fps, size))
        return -1;
    std::cout << "Writing to " << outputFilename << std::endl;

    tm.reset();
    for(int i = 1;; ++i) {
        //std::cout << "Read " << i << " frame" << std::endl;
        reader >> frame;
        if(frame.empty()) {
            std::cout << "Stop" << std::endl;
            break;
        }
        tm.start();
        writer.write(frame);
        tm.stop();
    }
    auto tt1 = tm.getTimeSec();

    ////--------------------   gpu
    reader.release();
    reader.open(filename);

    outputFilename = "output_gpu.h264";
    d_writer = cv::cudacodec::createVideoWriter(outputFilename , size, cv::cudacodec::Codec::H264, fps, cv::cudacodec::ColorFormat::BGR, 0, stream);
    std::cout << "Writing to " << outputFilename << std::endl;

    tm.reset();
    for(int i = 1;; ++i) {
        //std::cout << "Read " << i << " frame" << std::endl;
        reader >> frame;
        if(frame.empty()) {
            std::cout << "Stop" << std::endl;
            break;
        }

        d_frame.upload(frame, stream);
        //std::cout << "Write " << i << " frame" << std::endl;
        tm.start();
        d_writer->write(d_frame);
        tm.stop();
    }
    auto tt2 = tm.getTimeSec();

    printf("cpu time: %.3f sec (%.1ffps), gpu time: %.3f sec (%.1ffps)\n",
           tt1, framecount / tt1, tt2, framecount / tt2 );
    return 0;
}
#else
int main()
{
    std::cout << "OpenCV was built without CUDA Video encoding support\n" << std::endl;
    return 0;
}
#endif

2.3.1、2048x1080分辨率

直接看截图，测试结果 cpu time: 5.940 sec (90.9fps), gpu time: 0.138 sec (3926.1fps)
在这里插入图片描述
注意，这里实际包含了读写磁盘的耗时，avi的码率和264码率区别大（前者11Mbps，后者3Mbps），会存在读写上的耗时区别。

2.3.2、960x544分辨率

区别过于明显。。。。。
在这里插入图片描述

3、`cv::cudacodec::VideoReader`、`cv::cudacodec::VideoWriter`使用

3.1、cv::cudacodec::VideoWriter 编码

3.1.1、自定义数据回调的处理

前面代码中创建默认cv::cudacodec::VideoWriter之后，调用write函数将直接将内部编码后的原始码流数据直接保存在裸流文件中，此时用户无法获取编码数据。

d_writer = cv::cudacodec::createVideoWriter("output_gpu.h264", size, cv::cudacodec::Codec::H264, fps, cv::cudacodec::ColorFormat::BGR, 
                                            NULL, stream);
...
d_writer->write(d_frame); // 写出

如果需要获取每一个编码帧数据进行自定义处理，需要继承定义cv::cudacodec::EncoderCallback 类，用于传递回调接口给编码器内部，例如仅实现264裸流数据文件的保存

class EncCb : public cv::cudacodec::EncoderCallback {
    FILE *p = nullptr;
public:
    EncCb() {
        p = fopen("enccb.h264","wb");
    }

    virtual void onEncoded(std::vector<std::vector<uint8_t>> vPacket) override {
        for(const auto & pack: vPacket)
            fwrite(pack.data(), 1, pack.size(), p);
    }

    virtual void onEncodingFinished() override{
        printf("write done!\n");
        fclose(p);
    }
};

// 测试部分代码
d_writer = cv::cudacodec::createVideoWriter("output_gpu.h264", size, cv::cudacodec::Codec::H264, fps, cv::cudacodec::ColorFormat::BGR, 
                                            cv::makePtr<EncCb>(), stream);  // 回调对象
...
d_writer->write(d_frame); // 写出

3.1.2、编码参数配置

可以通过cv::cudacodec::EncoderParams设置编码参数，调用 createVideoWriter 第二个重载函数。

cv::cudacodec::EncoderParams encParams;  // 默认值

d_writer = cv::cudacodec::createVideoWriter(outputFilename, size, cv::cudacodec::Codec::H264, fps, cv::cudacodec::ColorFormat::BGR,
                                            encParams, // 参数选项
                                            cv::makePtr<EncCb>("enccb.h264"), stream);

3.3、cv::cudacodec::VideoReader 解码

常规使用方式和 cv::VideoCapture 类似，仅多了 gpu<->cpu 的区别。同样支持

// 
cv::Mat frame;
cv::VideoCapture reader(fname);
for (;;)
{
    if (!reader.read(frame))
        break;
        
    cv::imshow("CPU", frame);
    if (cv::waitKey(3) > 0)
       break;
}


cv::cuda::GpuMat d_frame;
cv::Ptr<cv::cudacodec::VideoReader> d_reader = cv::cudacodec::createVideoReader(fname);
 for (;;)
 {
     if (!d_reader->nextFrame(d_frame))
         break;
         
     d_frame.download(frame);   // 显示需要将gpu数据拷贝到cpu
     
     cv::imshow("GPU", frame);
     if (cv::waitKey(3) > 0)
         break;
 }

在创建 VideoReader 时，有两种接口

/** @brief Creates video reader.

@param filename Name of the input video file.
@param sourceParams Pass through parameters for VideoCapure.  VideoCapture with the FFMpeg back end (CAP_FFMPEG) is used to parse the video input.
The `sourceParams` parameter allows to specify extra parameters encoded as pairs `(paramId_1, paramValue_1, paramId_2, paramValue_2, ...)`.
    See cv::VideoCaptureProperties
e.g. when streaming from an RTSP source CAP_PROP_OPEN_TIMEOUT_MSEC may need to be set.
@param params Initializaton parameters. See cv::cudacodec::VideoReaderInitParams.

FFMPEG is used to read videos. User can implement own demultiplexing with cudacodec::RawVideoSource
 */
CV_EXPORTS_W Ptr<VideoReader> createVideoReader(const String& filename, const std::vector<int>& sourceParams = {}, const VideoReaderInitParams params = VideoReaderInitParams());

/** @overload
@param source RAW video source implemented by user.
@param params Initializaton parameters. See cv::cudacodec::VideoReaderInitParams.
*/
CV_EXPORTS_W Ptr<VideoReader> createVideoReader(const Ptr<RawVideoSource>& source, const VideoReaderInitParams params = VideoReaderInitParams());

当后端使用FFMPEG时，可以设置额外的参数，以数组形式表达。

3.3.1、VideoReaderInitParams

/** @brief VideoReader initialization parameters
@param udpSource Remove validation which can cause VideoReader() to throw exceptions when reading from a UDP source.
@param allowFrameDrop Allow frames to be dropped when ingesting from a live capture source to prevent delay and eventual disconnection
when calls to nextFrame()/grab() cannot keep up with the source's fps.  Only use if delay and disconnection are a problem, i.e. not when decoding from
video files where setting this flag will cause frames to be unnecessarily discarded.
@param minNumDecodeSurfaces Minimum number of internal decode surfaces used by the hardware decoder.  NVDEC will automatically determine the minimum number of
surfaces it requires for correct functionality and optimal video memory usage but not necessarily for best performance, which depends on the design of the
overall application. The optimal number of decode surfaces (in terms of performance and memory utilization) should be decided by experimentation for each application,
but it cannot go below the number determined by NVDEC.
@param rawMode Allow the raw encoded data which has been read up until the last call to grab() to be retrieved by calling retrieve(rawData,RAW_DATA_IDX).
@param targetSz Post-processed size (width/height should be multiples of 2) of the output frame, defaults to the size of the encoded video source.
@param srcRoi Region of interest (x/width should be multiples of 4 and y/height multiples of 2) decoded from video source, defaults to the full frame.
@param targetRoi Region of interest (x/width should be multiples of 4 and y/height multiples of 2) within the output frame to copy and resize the decoded frame to,
defaults to the full frame.
*/
struct CV_EXPORTS_W_SIMPLE VideoReaderInitParams {
    CV_WRAP VideoReaderInitParams() : udpSource(false), allowFrameDrop(false), minNumDecodeSurfaces(0), rawMode(0) {};
    CV_PROP_RW bool udpSource;
    CV_PROP_RW bool allowFrameDrop;
    CV_PROP_RW int minNumDecodeSurfaces;
    CV_PROP_RW bool rawMode;
    CV_PROP_RW cv::Size targetSz;
    CV_PROP_RW cv::Rect srcRoi;
    CV_PROP_RW cv::Rect targetRoi;
};

3.3.1、RawVideoSource

纯虚类，需要继承后实例化。用于解码自定义的数据。

/** @brief Interface for video demultiplexing. :

User can implement own demultiplexing by implementing this interface.
 */
class CV_EXPORTS_W RawVideoSource
{
public:
    virtual ~RawVideoSource() {}

    /** @brief Returns next packet with RAW video frame.

    @param data Pointer to frame data.
    @param size Size in bytes of current frame.
     */
    virtual bool getNextPacket(unsigned char** data, size_t* size) = 0;

    /** @brief Returns true if the last packet contained a key frame.
     */
    virtual bool lastPacketContainsKeyFrame() const { return false; }

    /** @brief Returns information about video file format.
    */
    virtual FormatInfo format() const = 0;

    /** @brief Updates the coded width and height inside format.
    */
    virtual void updateFormat(const FormatInfo& videoFormat) = 0;

    /** @brief Returns any extra data associated with the video source.

    @param extraData 1D cv::Mat containing the extra data if it exists.
     */
    virtual void getExtraData(cv::Mat& extraData) const = 0;

    /** @brief Retrieves the specified property used by the VideoSource.

    @param propertyId Property identifier from cv::VideoCaptureProperties (eg. cv::CAP_PROP_POS_MSEC, cv::CAP_PROP_POS_FRAMES, ...)
    or one from @ref videoio_flags_others.
    @param propertyVal Value for the specified property.

    @return `true` unless the property is unset set or not supported.
     */
    virtual bool get(const int propertyId, double& propertyVal) const = 0;
};