理解H.264 SVC的MGS 之结构篇_mgs帧的三个基本含义-CSDN博客

本文深入解析MGS（Multiple Granularity Scalability）结构的工作原理及其实现细节，包括关键概念如referencebasepicture及其在编码过程中的作用，同时介绍了相关语法元素store_ref_base_pic_flag和use_ref_base_pic_flag的应用场景。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

from：http://forum.byr.edu.cn/pc/pccon.php?id=1727&nid=75114&order=&tid=1329

以下都是我个人在学习过程中的理解。如果发现任何错误和问题，请站内信给我或者email给我（wh3124$163.com，把$换成@)。在本文中出现的“参考帧”这个名词时，都是特指inter-frame prediction的参考，即P，B的参考。而不是inter-layer prediction的参考。

1. 认识MGS结构

理解MGS结构，先要理解下面这个图（fig.8 in Overview of the Scalable Video Coding Extension of the H.264/AVC Standard），该图描述了在SNR分级时，inter prediction可能的几种选择：

(a)Base layer only，非常简单，所有的inter prediction全都是在SNR BL层完成。这样做的缺点就是编码的效率会不好。因为所有的参考帧都是SNR BL，也就是最不“清楚”的图像。参考帧品质差，那么残差就会大，进而编出的bit会多。

(b)Enhancement layer only，克服了(a)的缺点，所有参考帧都用最“清楚”的图像，那么编码效率最好。这种结构仍然有问题，一旦解码端只收到了SNR BL，而SNR EL都丢失了。那么很显然会造成解码出问题，并且这种问题会一直传播下去直到下一个IDR，即drift效应。

(c)Two-loop，很自然可以想到用这种结构来克服(a)和(b)的问题。但很明显这样的结构仍然会有drift效应。

(d)Key picture，SVC标准使用的方法，实际是一种a)和(b)的折中方案。key picture就是temporal id为0的access unit（图中用条纹标明）。对key picture而言，它们只能采用SNR BL层来实现inter prediction，这和图(a)是完全一致的；而non key picture，则一概使用最“清楚”的图像作为参考帧，这个图(b)是完全一样的。简单分析一下，编码效率角度来看由于大部分图像都是使用最“清楚”的图像作参考，因此编码效率较好；drift控制角度来看任何由于SNR EL的丢弃造成的错误都不可能转播到下一个key picture之后，因此也不错。

2. 概念reference base picture

出发点明确了以后，再看SVC标准是如何实现这样的MGS结构的。首先看标准上的一个概念reference base picture：

G.3.46 reference base picture：A reference picture that is obtained by decoding a base quality layer representation with the nal_ref_idc syntax element not equal to 0 and the store_ref_base_pic_flag syntax element equal to 1 of an access unit and all layer presentations of the access unit that are referred to by inter-layer prediction of the base quality layer representation. A reference base picture is not an output of the decoding process, but the samples of a reference base picture may be used for inter prediction in the decoding process of subsequent pictures in decoding order. Reference base picture is a collective term for a reference base field or a reference base frame.

简单说，就是QId等于0的参考图像。它可能并不是最终的输出图像（因为最终的输出图像可能是SNR BL + SNR EL），但它在整个解码的过程中，会在某些场合下作为参考帧。结合下面的一个例子来理解

QId = 1: I1 b B1 b P1 ...

QId = 0: I0 b B0 b P0 ... (example-1)

哪些picture是reference base picture？很明显按照reference base picture的定义I0、P0、B0都是，I1、P1、B1不是（QId不为0），b也不是（不是参考帧）。也很容易理解，我们最后需要输出的图像是SNR BL + EL的最“清楚”图像，所以I0，P0，B0并不会被输出。但是它们有可能会被用作参考。什么时候用作参考？这时候就和MGS key picture有关系了。结合上面的图(d)，可以看到解码P0的时候I0是要被当作参考帧的。这就是reference base picture的含义。

3. 语法元素store_ref_base_pic_flag

更进一步说，key picture中reference base picture会充当参考帧的角色，尽管不会最终输出，它们仍需要被存入DPB中；而non key picture（例如B0）的reference base picture，它在整个解码的过程中并不会充当参考，因此不应该被放入DPB中占用多余资源。那么这里引入标准中的一个语法元素store_ref_base_pic_flag：

store_ref_base_pic_flag equal to 1 specifies that, when the value of dependency_id as specified in the NAL unit header is equal to the maximum value of dependency_id for the VCL NAL units of the current access unit, an additional representation of the coded picture that may or may not be identical to the decoded picture is marked as "used for reference". This additional representation is also referred to as reference base picture and may be used for inter prediction of following pictures in decoding order, but it is not output.

简单说，这个语法元素就是指定当前nalu包含的reference base picture是不是要被存入DPB中用作将来的参考。再观察可以发现，store_ref_base_pic_flag这个语法元素一般情况下只会在Qid=0的NALU中。那么回到上面的例子，显然对于I0和P0，store_ref_base_pic_flag应该为1；B0的store_ref_base_pic_flag应该为0。

4. 语法元素use_ref_base_pic_flag

接下来我们再看另一个语法元素use_ref_base_pic_flag：

use_ref_base_pic_flag equal to 1 specifies that reference base pictures are used as reference pictures for the inter prediction process. use_ref_base_pic_flag equal to 0 specifies that decoded pictures are used as reference pictures during the inter prediction process.

该语法元素属于nal_unit_header_svc_extenstion，它的作用就是表明当前的nalu使用什么作为参考。需要注意的是对于Dependency ID相同的nalu，use_ref_base_pic_flag都得是相同的。再拿一个例子来说明

QId = 1: I1 B1 P1 ...

QId = 0: I0 B0 P0 ... (example-2)

假设解码端现在BL和EL都能收到。当解码P0时，P0有两个参考帧可以选用，1) I0，即reference base picture；2) I0+I1，即decoded picture。如果我们使用的是MGS，那么P0显然是key picture，它必须使用I0做参考。因此此时P0的use_ref_base_pic_flag就应该是1。遵循标准，P1的use_ref_base_pic_flag也是1。

现在来看解码B0，根据MGS的结构，B0应该采用I0+I1和P0+P1，即decoded pictures来当作参考，因此B0的use_ref_base_pic_flag应该是0。同样道理，B1的use_ref_base_pic_flag也是0。下面来说明一下use_ref_base_pic_flag等于0，即decoded pictures来当作参考的具体含义。在上面的例子examp1e-2里我们假设了BL和EL都在解码端收到了，那如果EL丢掉了呢？再看此时的B0解码，它会使用仅有的I0，P0来当作参考帧来解码。这就是“decoded pictures”的含义。显然这样解码的B0、B1会有错误，但是这里的drift效应不会扩散到P0之后。

这里有个额外的话题，我们可以发现对于I picture，它们的use_ref_base_pic_flag是没有意义的，因为它们不需要任何参考帧。那为什么这个语法元素还要出现在I NALU中增加1 bit的冗余呢？原因我想可能是这样的：前面已经说过use_ref_base_pic_flag是nal_unit_header_svc_extenstion的一个语法元素。在SVC码流在传输的过程中网络中会有一些adapting节点，用来做bitstream extracting。它们会parse每个nalu的header，然后判断哪些NALU要被丢掉。这样的节点通常要处理大量的数据包，因此我们希望adapting的实现最简单经济。因此显然一致的nalu header结构是最适合的。如果对I nalu去掉这个语法元素，那么adapting节点还要多一个额外的判断。

5. 实例

到这里基本上MGS的结构和标准的实现方法已经介绍完了。下面给出一个例子：

cfg for main:

# JSVM Main Configuration File
OutputFile              test.264   # Bitstream file
FrameRate               15.0       # Maximum frame rate [Hz]
FramesToBeEncoded       5          # Number of frames (at input frame rate)
GOPSize                 4          # GOP Size (at maximum frame rate)
IntraPeriod             -1
BaseLayerMode           2          # Base layer mode (0,1: AVC compatible,
                                   #                    2: AVC w subseq SEI)
CgsSnrRefinement        1          # SNR refinement as 1: MGS; 0: CGS
EncodeKeyPictures       1          # Key pics at T=0 (0:none, 1:MGS, 2:all)
MGSControl              2          # ME/MC for non-key pictures in MGS layers
                                   # (0:std, 1:ME with EL, 2:ME+MC with EL)
SearchMode              4          # Search mode (0:BlockSearch, 4:FastSearch)
SearchRange             32         # Search range (Full Pel)
NumLayers               2          # Number of layers
LayerCfg                my_layer0.cfg # Layer configuration file
LayerCfg                my_layer1.cfg # Layer configuration file

cfg for layer 0:

# JSVM Layer Configuration File

InputFile FOREMAN_352x288_15_orig_01.yuv # Input file

SourceWidth 352 # Input frame width

SourceHeight 288 # Input frame height

FrameRateIn 15 # Input frame rate [Hz]

FrameRateOut 15 # Output frame rate [Hz]

SymbolMode 1 # 0=CAVLC, 1=CABAC

cfg for layer 1:

# JSVM Layer Configuration File

InputFile FOREMAN_352x288_15_orig_01.yuv # Input file

SourceWidth 352 # Input frame width

SourceHeight 288 # Input frame height

FrameRateIn 15 # Input frame rate [Hz]

FrameRateOut 15 # Output frame rate [Hz]

SymbolMode 1 # 0=CAVLC, 1=CABAC

InterLayerPred 2 # Inter-layer Prediction (0: no, 1: yes, 2:adaptive)

MGSVectorMode 1 # MGS vector usage selection

MGSVector0 8 # Specifies 0th position of the vector

MGSVector1 8 # Specifies 1st position of the vector

以下是从编码trace文件中取出的结果：

Structure:

Q2: I2 b B2 b P2

Q1: I1 b B1 b P1

Q0: I0 b B0 b P0

Picture	DTQ	use_ref_base_pic_flag	store_ref_base_pic_flag
I0	000	1(无意义)	1
I1	001	1(无意义)	无
I2	002	1(无意义)	无
P0	000	1	1
P1	001	1	无
P2	002	1	无
B0	010	0	0
B1	011	0	无
B2	012	0	无