视频ROI编码的实现、测试和分析方法

最新推荐文章于 2025-05-07 09:15:57 发布

音视频开发进阶

最新推荐文章于 2025-05-07 09:15:57 发布

阅读量2.1k

点赞数 2

文章标签：音视频

原文链接：https://mp.weixin.qq.com/s?__biz=MzA4MjU1MDk3Ng==&mid=2451533476&idx=1&sn=d452b54c16632bc3eb28feea31ee6567&chksm=891921482fee5c6c5da7f9fa29f03357dfca456a76ee27005421aa6a96014d76da54f72f8674&scene=126&sessionid=0

版权

1、什么是ROI编码

大家可能经常听到ROI这个词，特别是在FFmpeg微信群（@大师兄）。但是通常说的ROI是return on investment，而视频编码ROI是region of interest。简而言之，ROI编码是提高特定区域的视频编码质量。这个特定区域，99%的案例中是人脸区域。

2、ROI编码的实现

实现ROI编码总共需要两步：

找出特定区域（😊）
提高特定区域编码质量

第一步基本上由AI包办，第二步交给编码器完成。

编码器调整特定区域编码质量的基本原理是调整量化参数qp（Quantization Parameter）。简单来说，qp越大，量化误差越大，编码质量越差；反之，qp越小，量化误差越小，编码质量越高。想增加某个区域的编码质量，即减小某个区域的qp。

可以看下x264 ROI编码相关的API：

typedef struct x264_image_properties_t
{
    /* All arrays of data here are ordered as follows:
     * each array contains one offset per macroblock, in raster scan order.  In interlaced
     * mode, top-field MBs and bottom-field MBs are interleaved at the row level.
     * Macroblocks are 16x16 blocks of pixels (with respect to the luma plane).  For the
     * purposes of calculating the number of macroblocks, width and height are rounded up to
     * the nearest 16.  If in interlaced mode, height is rounded up to the nearest 32 instead. */


    /* In: an array of quantizer offsets to be applied to this image during encoding.
     *     These are added on top of the decisions made by x264.
     *     Offsets can be fractional; they are added before QPs are rounded to integer.
     *     Adaptive quantization must be enabled to use this feature.  Behavior if quant
     *     offsets differ between encoding passes is undefined. */
    float *quant_offsets;
    /* In: optional callback to free quant_offsets when used.
     *     Useful if one wants to use a different quant_offset array for each frame. */
    void (*quant_offsets_free)( void* );

quant_offsets是一个表格，长度等于视频宏块的个数。通过传给编码器qp的offset值，达到调整对应宏块编码质量的目标，即ROI编码。

其他编码器ROI编码接口与x264差不多。FFmpeg libavcodec编码器封装也提供了ROI的支持，关键数据结构是作为AVFrame sidedata的AVRegionOfInterest：

/**
 * Structure describing a single Region Of Interest.
 *
 * When multiple regions are defined in a single side-data block, they
 * should be ordered from most to least important - some encoders are only
 * capable of supporting a limited number of distinct regions, so will have
 * to truncate the list.
 *
 * When overlapping regions are defined, the first region containing a given
 * area of the frame applies.
 */
typedef struct AVRegionOfInterest {
    /**
     * Must be set to the size of this data structure (that is,
     * sizeof(AVRegionOfInterest)).
     */
    uint32_t self_size;
    /**
     * Distance in pixels from the top edge of the frame to the top and
     * bottom edges and from the left edge of the frame to the left and
     * right edges of the rectangle defining this region of interest.
     *
     * The constraints on a region are encoder dependent, so the region
     * actually affected may be slightly larger for alignment or other
     * reasons.
     */
    int top;
    int bottom;
    int left;
    int right;
    /**
     * Quantisation offset.
     *
     * Must be in the range -1 to +1.  A value of zero indicates no quality
     * change.  A negative value asks for better quality (less quantisation),
     * while a positive value asks for worse quality (greater quantisation).
     *
     * The range is calibrated so that the extreme values indicate the
     * largest possible offset - if the rest of the frame is encoded with the
     * worst possible quality, an offset of -1 indicates that this region
     * should be encoded with the best possible quality anyway.  Intermediate
     * values are then interpolated in some codec-dependent way.
     *
     * For example, in 10-bit H.264 the quantisation parameter varies between
     * -12 and 51.  A typical qoffset value of -1/10 therefore indicates that
     * this region should be encoded with a QP around one-tenth of the full
     * range better than the rest of the frame.  So, if most of the frame
     * were to be encoded with a QP of around 30, this region would get a QP
     * of around 24 (an offset of approximately -1/10 * (51 - -12) = -6.3).
     * An extreme value of -1 would indicate that this region should be
     * encoded with the best possible quality regardless of the treatment of
     * the rest of the frame - that is, should be encoded at a QP of -12.
     */
    AVRational qoffset;
} AVRegionOfInterest;

可以查看一下有哪些编码器实现了ROI：

grep 'AVRegionOfInterest' -rl libavcodec/


libavcodec/libx264.c


libavcodec/libvpxenc.c


libavcodec/libx265.c


libavcodec/vaapi_encode.c


libavcodec/qsvenc.c

3、测试ROI编码

FFmpeg命令行可以用来测试ROI编码。libavfilter里有AI的支持，可以实现人脸识别，刚跑了一下，结果openvino崩溃了……libavfilter有个addroi的filter，手动指定一个区域，用来做测试更简单。

正向优化不一定能看出明显效果，负优化更容易看出效果，例如：

./ffmpeg -i input.mp4 -an -c:v libx264 -vf addroi=x=350:y=270:w=300:h=400:qoffset=0.9 -frames 1 output.mp4

这里针对[x, y, x + w, y + h]区域做了负优化，设置qoffset = 0.9，基本等于抹除了内容。结果如下：

正常业务使用ROI编码，一般只会小幅调整qp。

4、分析ROI编码

前面说了，正向优化小幅调整qp，效果如何需要仔细对比分析。除了直接看主观效果，我们还想要分析下qp的变化。看qp变化，可以用专业的编码分析工具来做。没专业工具的情况下，我们也可以借助FFmpeg来分析。

看一个正向优化的例子，把前面的命令改改成qoffset=-0.5（仅做示例）

./ffmpeg -i input.mp4 -an -c:v libx264 -vf addroi=x=350:y=270:w=300:h=400:qoffset=-0.5 -frames 3 out.mp4

打印出qp来：

./ffmpeg -threads 1 -debug qp -i out.mp4 -f null -

可以看到，ROI区域的qp非常小。再放大看下：

注意，FFmpeg debug qp当前只支持H.264和mpeg，不支持H.265，TODO。并且打印时带上坐标是我最近加的功能，建议用FFmpeg每日构建来测试。

FFmpeg libavfilter中的codecview支持用chroma plane来显示qp，只支持mpeg编码。我试着加了H.264的支持，但是因为显示效果太差，还不如直接看打印qp的值，放弃了。

可能通过drawtext的功能，在图像上叠加打印qp的值，效果更好，TODO。