A novel macroblock-tree algorithm for high-performance optimization

你好，请叫我靓仔

于 2021-03-21 20:49:15 发布

阅读量431

点赞数 2

分类专栏：文献阅读

本文链接：https://blog.csdn.net/DuoKingg/article/details/115051313

版权

本文介绍了x264编码器中的一种新型宏块树算法，该算法在保持较低运算成本的同时，相比传统算法在PSNR和SSIM性能上显著提升。通过分析帧间的宏块信息，该算法优化了量化参数的选择，实现比特率的智能分配，尤其适用于视频编码高压缩率场景。实验结果表明，宏块树算法在实际应用中展现出良好的性能和效率。

摘要由CSDN通过智能技术生成

导读

这是x264的开发者关于写的一篇文章，类似于工程开发过程的技术文档。主要是介绍关于其中的Macroblock-tree的原理及实现

文章翻译

0 Abstract

提出了一个快速启发式算法来近似率失真依赖视频编码。之前针对这个问题的算法都是假设每一帧内使用常数量化参数并且依赖实际很慢的方法，比如每一帧重复几十次的编码。新的宏块树方法在很低的运算消耗下比已有的宏块树算法在PSNR性能上提高1.2db，在SSIM性能上提高2.3db。宏块树已经被应用在了开源H.264/AVC编码器x264上。

A novel fast heuristic algorithm for approximating rate-distortion-optimal dependent video coding is presented. Previous algorithms that solve this problem assume constant quantizer within each frame and rely on impractically slow approaches, such as dozens of repeated encodes of each frame. This novel macroblock-tree approach provides PSNR improvements of up to 1.2db and SSIM improvements of up to 2.3db over existing fast ratecontrol algorithms at very low computational cost. Macroblock-tree has been implemented in and committed to the open source H.264/AVC encoder x264.

1 Introduction

一个帧序列的智能比特分配是实现视频编码高压缩率的关键。优化这种平衡的标准方法是率失真优化。然而，为多依赖帧的比特分配问题找一个率失真最优化的解，由于问题的高复杂度是不可行的。

Intelligent bit allocation across a sequence of frames is critical to achieving high rates of compression in video coding. The standard approach to optimizing this tradeoff is rate-distortion optimization.[1] However, finding a rate-distortion-optimal solution for bit allocation across multiple dependent frames is typically infeasible due to its high complexity.

大多数现存的方法，包括启发式算法和最优化算法，都有不切实际的高复杂度或者只提高了一点压缩性能。并且，尽管他们的高复杂度，大多数已有的解决方法仍然假设每一帧内使用常数量化参数。我们提出了一个新的宏块树算法，以忽略的计算消耗跨越多依赖真来优化量化参数选择。

Most existing solutions, both heuristic and optimal, have impractically high complexity or provide only a small compression improvement. Furthermore, despite their high complexity, most existing solutions still assume a constant
quantizer within each frame. We propose a novel macroblock-tree algorithm to optimize per-block quantizer selection across multiple dependent frames at negligible computational cost.

文章组织如下。section2提出率失真优化码率控制问题的背景和已有启发式算法。section3高度概括了宏块树算法和他的目的。section4解释了x264的Lookahead结构，这个结构是宏块树算法实现的基础。section5介绍了宏块树算法本身。section6包含算法的典型结果的分析。和宏块树相关的感知因素的讨论在section7。数值质量结果在section8展示。性能分析在section9展示。文章总结在section10。

This paper is organized as follows. Section 2 provides background for the problem of rate-distortion-optimal ratecontrol and existing heuristics. Section 3 gives a high-level overview of the macroblock-tree algorithm and its purpose. Section 4 explains the lookahead framework of x264 which was used as a basis for our implementation of the macroblock-tree algorithm. Section 5 introduces the macroblock-tree algorithm itself. Section 6 contains an analysis of the typical consequences of the algorithm. Perceptual considerations related to the macroblock-tree are discussed in section 7. Numerical quality results are presented in section 8, with performance analysis in section 9. The paper is concluded in section 10.

2 Backgroud

最简单的可能的码率控制算法是给定一个约束集合，尝试去得到一个常数量化参数目标。在视频编码技术发展的早期，人们就意识到了：通过使用率失真最优化技术改变帧的量化参数是次优的。率失真技术可以在给定码率条件下提高质量，通常用PSNR和SSIM的形式衡量质量。

The simplest possible ratecontrol method is one which, given some set of constraints, attempts to target a constant quantizer. It was realized very early in the development of video coding techniques that this was suboptimal: by varying the quantizers of frames using rate-distortion optimization techniques, one could improve quality, usually measured in the form of Peak Signal-to-Noise Ratio (PSNR) or sometimes Structural Similarity (SSIM), at a given rate.

早期最优化这个问题的算法一般是暴力求解，为每一帧尝试许多量化参数来选择最优的一个。帧间压缩的出现使这个事情变得很复杂，因为可能的量化参数组合的数量呈指数增长，并且帧不可以再被独立的进行最优化。使用Viterbi算法可以解决这个问题。

Early algorithms to optimize this problem were typically brute-force, trying many quantizers for each frame in an attempt to pick the best one. The advent of inter-frame compression complicated the matter, as the number of possible quantizer combinations grew exponentially and frames could no longer be optimized independently. This was solved using Viterbi algorithms, as in Ramchandran et al.[2]

然而，尽管Viterbi算法可以得到最优解，但是算法仍然非常慢并且在很多情况下仍然具有呈指数的最坏收敛时间。Sermadevi提出的一个快速算法有O(QxMxN)的运行时间，其中Q是搜索的量化参数的数量，N是帧数量，M是在任何给定帧的分配改变时被影响的帧的数量。即使Toivonen算法有提高，但是这类算法仍然在每一帧调用几十或几百次编码，这超出了大多数实际编码器的能力。

However, while Viterbi made the optimal solution tractable, such algorithms were still very slow and in some cases still had exponential worst-case convergence time. One “fast” algorithm by Sermadevi et al had a runtime of O(QxNxM), where Q is the number of quantizers to search, N is the number of frames, and M is the number of frames affected by a change in allocation to any given frame.[3] Even as improved in Toivonen et al, this class of algorithms still typically took dozens or hundreds of encode calls per frame[4], putting it out of the reach of most practical encoders.

尽管如此，许多简单的启发式算法都来自于这个研究并且运用在实际的编码器上。Ramchandran介绍的一个算法是：“I帧是图像组中最重要的并且不能被压缩”，也就是说，I帧应该比其他帧给更高的质量。另一个常见的启发式算法是分配更低的质量给未参考的B帧，因为他们的像素没有再被用于预测。

Nevertheless, many simple heuristics have been derived from this research and used in practical encoders. One described in Ramchandran et al is that the “… I-frame is the most important of the group of pictures and must not be compromised,” in other words, that I-frames should be given higher quality than other frames.[2] Another common
heuristic is assigning lower quality to unreferenced B-frames, as their pixels are not reused for prediction.

这些启发式算法在现在的视频编码器中非常常见，特别是x264使用了1.4的I帧偏移：比P帧高出1.4x倍的质量，这是以线性量化参数尺度来度量的。在H.264中，这对应大约-3QP。类似的，x264使用1.3的B帧偏移，或者说是比P帧低处1.3x倍的质量，大约是+2QP。

These heuristics are extremely common in modern video encoders. x264 in particular uses an I-frame offset of 1.4: 1.4x higher quality than P-frames, measured in linearized quantizer scale. In H.264, this maps to approximately -3 QP. Similarly, x264 uses a B-frame offset of 1.3, or 1.3x lower quality than P-frames, approximately +2 QP.

另一个常见的启发式算法是“量化参数曲线压缩”，或称作“qcomp”。“qcomp”尝试不用复杂的计算实际RD曲线的条件下弥补帧之间RD曲线的差异。这个通过利用帧的残差和其对于未来帧预测的重要程度之间的相关性来实现的。

Another common heuristic is known as “quantizer curve compression”, or “qcomp”. qcomp attempts to compensate for the variance in RD curves among frames without the complexity of calculating the actual RD curves. It does this by leveraging the correlation between the inter residual of a frame and its importance for predicting future frames.

经典的帧间预测对于具有高帧间残差的视频部分是作用较差，因此高质量的参考帧的质量是比较低的。像这样的，qcomp以他们的帧间残差为反比来调整帧的质量。这个算法最初被提出来是用在libavcodec的MPEG 视频编码器。x264利用qcomp测量帧间的SATD残差，在残差上进行高斯模糊来限制局部变化，然后用0.4乘以所有的帧质量。结合常数I帧和B帧偏移的启发式算法，qcomp以可忽略的计算消耗，基本实现了更慢的RD-optimal码率控制算法的效果。

Typically inter prediction is less useful in sections of video with high inter residual, and thus the value of a higher quality reference frame is lower. As such, qcomp adjusts the quality of frames in inverse proportion to their inter residual. This algorithm was originally invented for use in libavcodec’s MPEG video encoder. x264’s implementation of qcomp measures the Sum of Absolute Hadamard-Transformed Differences (SATD) residuals of frames, performs a Gaussian blur over the residuals to limit local variation, then multiplies the quality of all frames by (SATD residual)0.4. Combined with heuristics such as constant I-frame and B-frame offsets, qcomp helps approximate the effect of a much slower RD-optimal ratecontrol algorithm with negligible computational cost.[5]

在2003年，Toivone