Direct Mode Coding For Bi-Predictive Pictures in the H.264 Standard

本文链接：https://blog.csdn.net/dplion/article/details/2510762

The new H.264 (MPEG-4 AVC) video coding standard can achieve considerably higher coding efficiencycompared to previous standards. This is accomplished mainly due to the consideration of variable block sizes formotion compensation, multiple reference frames, intra prediction, but also due to better exploitation of thespatiotemporal correlation that may exist between adjacent Macroblocks, with the SKIP mode in Predictive (P)slices and the two DIRECT modes in Bi-predictive (B) slices. These modes, when signaled, could in effect representthe motion of a macroblock or block without having to transmit any additional motion information required by otherInter macroblock types. This property also allows these modes to be highly compressible especially due to theconsideration of Run Length Coding strategies. Although for SKIP mode spatial correlation of motion vectors fromadjacent macroblocks is used to predict its motion parameters, until recently DIRECT Mode considered onlytemporal correlation of adjacent pictures. In this paper we introduce alternative methods for the generation of themotion information for the DIRECT Mode using spatial or combined spatiotemporal correlation. Considering thattemporal correlation requires that the motion and timestamp information from previous pictures are available in boththe encoder and decoder, it is shown that our spatial-only method can reduce or eliminate such requirements while,at the same time, achieving similar performance. The combined methods on the other hand, by jointly exploitingspatial and temporal correlation either at the macroblock or slice/picture level, can achieve even higher codingefficiency. Finally, improvements on the existing Rate Distortion Optimization related to B slices within the H.264codec are also presented, which can lead to improvements of up to 16% in bitrate reduction or, equivalently, morethan 0.7dB in PSNR.Contact author:Alexis Michael TourapisThomson Inc. Corporate Research - Princeton2 Independence Way,Princeton, NJ 08540, USATel: (609) 987-7329, Fax: (609) 987-7299Email: alexismt@ieee.orgThe new H.264 (or JVT, H.26L, MPEG-4 AVC) [1] video coding standard has gained more and more attentionrecently, mainly due to its higher coding efficiency versus previous standards. This new standard essentially relieson several new features [2] such as the consideration of variable block sizes, ranging from 16×16 down to 4×4, and aquadtree structure for motion compensation, multiple reference frames, intra prediction, context adaptive entropycoders, in loop de-blocking filtering, but also due to the consideration of Generalized Bi-predictive (B) Slices [3].Unlike older standards such as MPEG-2 [4] and MPEG-4 [5], B slices can use multiple predictions from picturescoming from the same direction (forward or backward) while these could also be used as references for other slices.Unfortunately, the above also implied that for this standard a considerably higher percentage of bits is needed forencoding motion information, either due to signaling block sizes, multiple references, or multiple predictions. Toalleviate this problem the SKIP [2] and DIRECT modes were introduced within Predictive (P) and B picturesrespectively, according to which motion is derived directly from previously encoded information, thus not having toencode any additional motion data for a macroblock (MB) or block. Motion for these modes was obtained byexploiting either spatial (SKIP) or temporal (DIRECT) correlation of the motion of adjacent MBs or pictures.Furthermore, considering the bi-predictive nature of B pictures, DIRECT mode could derive two such motionvectors pointing to different references, namely the List 0 and List 1 references, thus leading to even furtherperformance benefit. Additionally, since these modes can be signaled without having to transmit any residual data,and due to their very high occurrence within a video stream, Run Length Coding (RLC) strategies are employedwithin H.264 for coding mode information that depend on the entropy encoding scheme used and can increaseefficiency even further. In particular, if several (i.e. RunN) adjacent MBs according to the scanning order are to becoded using SKIP mode or DIRECT mode without coefficients then only the actual number of such MBs needs to becoded instead of coding each one individually. On the other hand, a drawback of this method is that if no such MBexists, it is also necessary to signal a zero runlength code, which could introduce additional overhead within thebitstream. Nevertheless, due to the high occurrence of SKIP and DIRECT modes, this negative impact is more thancompensated in most cases.DIRECT mode motion parameters using temporal correlation, were essentially derived for the current MB/blockby considering the motion information within a co-located position in a usually subsequent reference picture or moreCSVT-03-04-26 3precisely the first List 1 reference. Following the assumption that an object is moving with constant speed theseparameters are scaled according to the temporal distances (Figure 1) of the reference pictures involved. The motionvectors MV L0 and MV L1 for a DIRECT coded block versus the motion vector MV of its co-located position in thefirst List 1 reference were originally calculated as:MVTRMV TRDBL0 = × , (1)MVTRMV TR TRDB DL = ( − ) ×1 , (2)but were later replaced with equations:( ) D D X = 16384 + abs(TD / 2) /TD , (3)ScaleFactor = clip(−1024,1023,(TD × X + 32) >> 6) B , (4)MV L0 = (ScaleFactor ×MV +128) >> 8, (5)MV L1 = MV L0 − MV , (6)which can reduce the number of divisions required since variables X and ScaleFactor can be pre-computed at theSlice/Picture level. In the above TDB and TDD are the temporal distances, or more precisely Picture Order Count(POC) distances [1], of the reference picture used by the List 0 motion vector of the co-located block in the List 1picture compared to the current and the List 1 picture respectively. The List 1 reference picture and the reference inList 0 referred by the motion vectors of the co-located block in List 1 are used as the two references of DIRECTmode. Due to the temporal nature of the derivation of the motion parameters we will name this DIRECT mode asTemporal DIRECT.