# RTP Payload Format for High Efficiency Video Coding (HEVC)

linux 专栏收录该内容
23 篇文章 1 订阅

This memo describes an RTP payload format for the video coding standard ITU-T Recommendation H.265 and ISO/IEC International Standard 23008-2, both also known as High Efficiency Video Coding (HEVC) and developed by the Joint Collaborative Team on Video Coding (JCT-VC). The RTP payload format allows for packetization of one or more Network Abstraction Layer (NAL) units in each RTP packet payload as well as fragmentation of a NAL unit into multiple RTP packets. Furthermore, it supports transmission of an HEVC bitstream over a single stream as well as multiple RTP streams. When multiple RTP streams are used, a single transport or multiple transports may be utilized. The payload format has wide applicability in videoconferencing, Internet video streaming, and high-bitrate entertainment-quality video, among others.

1 Introduction

The High Efficiency Video Coding specification, formally published as both ITU-T Recommendation H.265 [HEVC] and ISO/IEC International Standard 23008-2 [ISO23008-2], was ratified by the ITU-T in April 2013; reportedly, it provides significant coding efficiency gains over H.264 [H.264].
This memo describes an RTP payload format for HEVC. It shares its basic design with the RTP payload formats of [RFC6184] and [RFC6190]. With respect to design philosophy, security, congestion control, and overall implementation complexity, it has similar properties to those earlier payload format specifications. This is a conscious choice, as at least RFC 6184 is widely deployed and generally known in the relevant implementer communities. Mechanisms from RFC 6190 were incorporated as HEVC version 1 supports temporal scalability.
In order to help the overlapping implementer community, frequently only the differences between RFCs 6184 and 6190 and the HEVC payload format are highlighted in non-normative, explanatory parts of this memo. Basic familiarity with both specifications is assumed for those parts. However, the normative parts of this memo do not require study of RFCs 6184 or 6190.

2013年4月，ITU-T批准了高效视频编码规范，正式作为ITU-T H.265建议书[HEVC]和ISO / IEC国际标准23008-2 [ISO23008-2]发布;据报道，它提供了超过H.264 [H.264]的显着编码效率增益。

1.1. Overview of the HEVC Codec
H.264 and HEVC share a similar hybrid video codec design. In this memo, we provide a very brief overview of those features of HEVC that are, in some form, addressed by the payload format specified herein. Implementers have to read, understand, and apply the ITU-T/ISO/IEC specifications pertaining to HEVC to arrive at interoperable, well- performing implementations. Implementers should consider testing their design (including the interworking between the payload format implementation and the core video codec) using the tools provided by ITU-T/ISO/IEC, for example, conformance bitstreams as specified in [H.265.1]. Not doing so has historically led to systems that perform badly and that are not secure.
Conceptually, both H.264 and HEVC include a Video Coding Layer (VCL), which is often used to refer to the coding-tool features, and a Network Abstraction Layer (NAL), which is often used to refer to the systems and transport interface aspects of the codecs.

H.264和HEVC共享类似的混合视频编解码器设计。 在本备忘录中，我们简要概述了HEVC的这些功能，这些功能在某种形式下由本文指定的有效载荷格式解决。 实施者必须阅读，理解和应用与HEVC相关的ITU-T / ISO / IEC规范，以实现可互操作，性能良好的实施。 实施者应考虑使用ITU-T / ISO / IEC提供的工具测试其设计（包括有效载荷格式实现与核心视频编解码器之间的互通），例如[H.265.1]中规定的一致性比特流。 历史上没有这样做会导致系统性能不佳而且不安全。

1.1.1. Coding-Tool Features

Similar to earlier hybrid-video-coding-based standards, including H.264, the following basic video coding design is employed by HEVC. A prediction signal is first formed by either intra- or motion- compensated prediction, and the residual (the difference between the original and the prediction) is then coded. The gains in coding efficiency are achieved by redesigning and improving almost all parts of the codec over earlier designs. In addition, HEVC includes several tools to make the implementation on parallel architectures easier. Below is a summary of HEVC coding-tool features.
One of the major tools that contributes significantly to the coding efficiency of HEVC is the use of flexible coding blocks and transforms, which are defined in a hierarchical quad-tree manner. Unlike H.264, where the basic coding block is a macroblock of fixed- size 16x16, HEVC defines a Coding Tree Unit (CTU) of a maximum size of 64x64. Each CTU can be divided into smaller units in a hierarchical quad-tree manner and can represent smaller blocks down to size 4x4. Similarly, the transforms used in HEVC can have different sizes, starting from 4x4 and going up to 32x32. Utilizing large blocks and transforms contributes to the major gain of HEVC, especially at high resolutions.

Entropy coding

HEVC uses a single entropy-coding engine, which is based on Context Adaptive Binary Arithmetic Coding (CABAC) [CABAC], whereas H.264 uses two distinct entropy coding engines. CABAC in HEVC shares many similarities with CABAC of H.264, but contains several improvements. Those include improvements in coding efficiency and lowered implementation complexity, especially for parallel architectures.

HEVC使用单个熵编码引擎，其基于上下文自适应二进制算术编码（CABAC）[CABAC]，而H.264使用两个不同的熵编码引擎。 HEVC中的CABAC与H.264的CABAC有许多相似之处，但包含一些改进。 其中包括提高编码效率和降低实现复杂性，尤其是对于并行架构。

In-loop filtering

H.264 includes an in-loop adaptive deblocking filter, where the blocking artifacts around the transform edges in the reconstructed picture are smoothed to improve the picture quality and compression efficiency. In HEVC, a similar deblocking filter is employed but with somewhat lower complexity. In addition, pictures undergo a subsequent filtering operation called Sample Adaptive Offset (SAO), which is a new design element in HEVC. SAO basically adds a pixel- level offset in an adaptive manner and usually acts as a de-ringing filter. It is observed that SAO improves the picture quality, especially around sharp edges, contributing substantially to visual quality improvements of HEVC.

H.264包括环路内自适应去块滤波器，其中对重建图像中的变换边缘周围的块效应进行平滑以改善图像质量和压缩效率。 在HEVC中，采用类似的去块滤波器，但复杂度稍低。 此外，图片经历称为样本自适应偏移（SAO）的后续过滤操作，其是HEVC中的新设计元素。 SAO基本上以自适应方式添加像素级偏移，并且通常用作去振铃滤波器。 据观察，SAO改善了图像质量，特别是在锐边附近，有助于HEVC的视觉质量改善。

Motion prediction and coding

There have been a number of improvements in this area that are summarized as follows. The first category is motion merge and Advanced Motion Vector Prediction (AMVP) modes. The motion information of a prediction block can be inferred from the spatially or temporally neighboring blocks. This is similar to the DIRECT mode in H.264 but includes new aspects to incorporate the flexible quad- tree structure and methods to improve the parallel implementations. In addition, the motion vector predictor can be signaled for improved efficiency. The second category is high-precision interpolation. The interpolation filter length is increased to 8-tap from 6-tap, which improves the coding efficiency but also comes with increased complexity. In addition, the interpolation filter is defined with higher precision without any intermediate rounding operations to further improve the coding efficiency.

Intra prediction and intra-coding
Compared to 8 intra prediction modes in H.264, HEVC supports angular intra prediction with 33 directions. This increased flexibility improves both objective coding efficiency and visual quality as the edges can be better predicted and ringing artifacts around the edges can be reduced. In addition, the reference samples are adaptively smoothed based on the prediction direction. To avoid contouring artifacts a new interpolative prediction generation is included to improve the visual quality. Furthermore, Discrete Sine Transform (DST) is utilized instead of traditional Discrete Cosine Transform (DCT) for 4x4 intra-transform blocks.

Other coding-tool features
HEVC includes some tools for lossless coding and efficient screen- content coding, such as skipping the transform for certain blocks. These tools are particularly useful, for example, when streaming the user interface of a mobile device to a large display.

HEVC包括一些用于无损编码和有效屏幕内容编码的工具，例如跳过某些块的变换。 例如，当将移动设备的用户界面流式传输到大型显示器时，这些工具特别有用。

1.1.2. Systems and Transport Interfaces

HEVC inherited the basic systems and transport interfaces designs from H.264. These include the NAL-unit-based syntax structure, the hierarchical syntax and data unit structure, the Supplemental Enhancement Information (SEI) message mechanism, and the video buffering model based on the Hypothetical Reference Decoder (HRD). The hierarchical syntax and data unit structure consists of sequence- level parameter sets, multi-picture-level or picture-level parameter sets, slice-level header parameters, and lower-level parameters. In the following, a list of differences in these aspects compared to H.264 is summarized.

HEVC继承了H.264的基本系统和传输接口设计。 这些包括基于NAL单元的语法结构，分层语法和数据单元结构，补充增强信息（SEI）消息机制，以及基于假设参考解码器（HRD）的视频缓冲模型。 分层语法和数据单元结构由序列级参数集，多图片级或图片级参数集，切片级报头参数和较低级参数组成。 在下文中，总结了与H.264相比这些方面的差异列表。

Video parameter set(VPS)

A new type of parameter set, called Video Parameter Set (VPS), was introduced. For the first (2013) version of [HEVC], the VPS NAL unit is required to be available prior to its activation, while the information contained in the VPS is not necessary for operation of the decoding process. For future HEVC extensions, such as the 3D or scalable extensions, the VPS is expected to include information necessary for operation of the decoding process, e.g., decoding dependency or information for reference picture set construction of enhancement layers. The VPS provides a “big picture” of a bitstream, including what types of operation points are provided, the profile, tier, and level of the operation points, and some other high-level properties of the bitstream that can be used as the basis for session negotiation and content selection, etc. (see Section 7.1).
Profile, tier, and level
The profile, tier, and level syntax structure that can be included in both the VPS and Sequence Parameter Set (SPS) includes 12 bytes of data to describe the entire bitstream (including all temporally scalable layers, which are referred to as sub-layers in the HEVC specification), and can optionally include more profile, tier, and level information pertaining to individual temporally scalable layers. The profile indicator shows the “best viewed as” profile when the bitstream conforms to multiple profiles, similar to the major brand concept in the ISO Base Media File Format (ISOBMFF) [IS014496-12] [IS015444-12] and file formats derived based on ISOBMFF, such as the 3GPP file format [3GPPFF]. The profile, tier, and level syntax structure also includes indications such as 1) whether the bitstream is free of frame-packed content, 2) whether the bitstream is free of interlaced source content, and 3) whether the bitstream is free of field pictures. When the answer is yes for both 2) and 3), the bitstream contains only frame pictures of progressive source. Based on these indications, clients/players without support of post-processing functionalities for the handling of frame-packed, interlaced source content or field pictures can reject those bitstreams that contain such pictures.

Bitstream and elementary stream

HEVC includes a definition of an elementary stream, which is new compared to H.264. An elementary stream consists of a sequence of one or more bitstreams. An elementary stream that consists of two or more bitstreams has typically been formed by splicing together two or more bitstreams (or parts thereof). When an elementary stream contains more than one bitstream, the last NAL unit of the last access unit of a bitstream (except the last bitstream in the elementary stream) must contain an end of bitstream NAL unit, and the first access unit of the subsequent bitstream must be an Intra-Random Access Point (IRAP) access unit. This IRAP access unit may be a Clean Random Access (CRA), Broken Link Access (BLA), or Instantaneous Decoding Refresh (IDR) access unit.

HEVC包括基本流的定义，与H.264相比是新的。 基本流由一个或多个比特流的序列组成。 通常通过将两个或更多个比特流（或其部分）拼接在一起来形成由两个或更多个比特流组成的基本流。 当基本流包含多于一个比特流时，比特流的最后一个访问单元的最后一个NAL单元（基本流中的最后一个比特流除外）必须包含比特流NAL单元的一端，以及后续比特流的第一个访问单元 必须是随机内接入点（IRAP）访问单元。 该IRAP访问单元可以是清洁随机访问（CRA），断链接入（BLA）或瞬时解码刷新（IDR）访问单元。

Random access support

Temporal scalability support

HEVC includes an improved support of temporal scalability, by inclusion of the signaling of TemporalId in the NAL unit header, the restriction that pictures of a particular temporal sub-layer cannot be used for inter prediction reference by pictures of a lower temporal sub-layer, the sub-bitstream extraction process, and the requirement that each sub-bitstream extraction output be a conforming bitstream. Media-Aware Network Elements (MANEs) can utilize the TemporalId in the NAL unit header for stream adaptation purposes based on temporal scalability.

HEVC包括通过在NAL单元头中包含TemporalId的信令来改进对时间可伸缩性的支持，特定时间子层的图片不能用于由较低时间子层的图片进行帧间预测参考的限制， 子比特流提取过程，以及每个子比特流提取输出是一致的比特流的要求。 媒体感知网络元件（MANE）可以基于时间可伸缩性利用NAL单元头中的TemporalId用于流适配目的。

Temporal sub-layer switching support
HEVC specifies, through NAL unit types present in the NAL unit header, the signaling of Temporal Sub-layer Access (TSA) and Step- wise Temporal Sub-layer Access (STSA). A TSA picture and pictures following the TSA picture in decoding order do not use pictures prior to the TSA picture in decoding order with TemporalId greater than or equal to that of the TSA picture for inter prediction reference. A TSA picture enables up-switching, at the TSA picture, to the sub- layer containing the TSA picture or any higher sub-layer, from the immediately lower sub-layer. An STSA picture does not use pictures with the same TemporalId as the STSA picture for inter prediction reference. Pictures following an STSA picture in decoding order with the same TemporalId as the STSA picture do not use pictures prior to the STSA picture in decoding order with the same TemporalId as the STSA picture for inter prediction reference. An STSA picture enables up-switching, at the STSA picture, to the sub-layer containing the STSA picture, from the immediately lower sub-layer.

HEVC通过NAL单元报头中存在的NAL单元类型指定时间子层接入（TSA）和逐步时间子层接入（STSA）的信令。按照解码顺序在TSA图像之后的TSA图像和图像不使用在TSA图像之前的图像的解码顺序，其中TemporalId大于或等于用于帧间预测参考的TSA图像的TemporalId。 TSA图像使得能够在TSA图像处从直接较低的子层向上切换到包含TSA图像或任何较高子层的子层。 STSA图片不使用具有与STSA图片相同的TemporalId的图片用于帧间预测参考。以与STSA图片相同的TemporalId的解码顺序的STSA图片之后的图片不使用与STSA图片在解码顺序中的图片之前具有与用于帧间预测参考的STSA图片相同的TemporalId。 STSA图片使得能够在STSA图片上从紧邻的较低子层向上包含STSA图片的子层。

Sub-layer reference or non-reference pictures

The concept and signaling of reference/non-reference pictures in HEVC are different from H.264. In H.264, if a picture may be used by any other picture for inter prediction reference, it is a reference picture; otherwise, it is a non-reference picture, and this is signaled by two bits in the NAL unit header. In HEVC, a picture is called a reference picture only when it is marked as “used for reference”. In addition, the concept of sub-layer reference picture was introduced. If a picture may be used by another other picture with the same TemporalId for inter prediction reference, it is a sub- layer reference picture; otherwise, it is a sub-layer non-reference picture. Whether a picture is a sub-layer reference picture or sub- layer non-reference picture is signaled through NAL unit type values.

HEVC中的参考/非参考图片的概念和信令与H.264不同。 在H.264中，如果图片可以被任何其他图片用于帧间预测参考，则它是参考图片; 否则，它是非参考图像，并且这由NAL单元头中的两个比特用信号通知。 在HEVC中，仅当图片被标记为“用于参考”时才将图片称为参考图片。 此外，还介绍了子层参考图的概念。 如果图片可以被具有相同TemporalId的另一个其他图片用于帧间预测参考，则它是子层参考图片; 否则，它是子层非参考图片。 通过NAL单元类型值来用信号通知图片是子层参考图片还是子层非参考图片。

Extensibility

Besides the TemporalId in the NAL unit header, HEVC also includes the signaling of a six-bit layer ID in the NAL unit header, which must be equal to 0 for a single-layer bitstream. Extension mechanisms have been included in the VPS, SPS, Picture Parameter Set (PPS), SEI NAL unit, slice headers, and so on. All these extension mechanisms enable future extensions in a backward-compatible manner, such that bitstreams encoded according to potential future HEVC extensions can be fed to then-legacy decoders (e.g., HEVC version 1 decoders), and the then-legacy decoders can decode and output the base-layer bitstream.

Bitstream extraction
HEVC includes a bitstream-extraction process as an integral part of the overall decoding process. The bitstream extraction process is used in the process of bitstream conformance tests, which is part of the HRD buffering model.
HEVC包括比特流提取过程，作为整个解码过程的组成部分。 比特流提取过程用于比特流一致性测试过程，这是HRD缓冲模型的一部分。

Reference picture management

The reference picture management of HEVC, including reference picture marking and removal from the Decoded Picture Buffer (DPB) as well as Reference Picture List Construction (RPLC), differs from that of H.264. Instead of the reference picture marking mechanism based on a sliding window plus adaptive Memory Management Control Operation (MMCO) described in H.264, HEVC specifies a reference picture management and marking mechanism based on Reference Picture Set (RPS), and the RPLC is consequently based on the RPS mechanism. An RPS consists of a set of reference pictures associated with a picture, consisting of all reference pictures that are prior to the associated picture in decoding order, that may be used for inter prediction of the associated picture or any picture following the associated picture in decoding order. The reference picture set consists of five lists of reference pictures; RefPicSetStCurrBefore, RefPicSetStCurrAfter, RefPicSetStFoll, RefPicSetLtCurr, and RefPicSetLtFoll. RefPicSetStCurrBefore, RefPicSetStCurrAfter, and RefPicSetLtCurr contain all reference pictures that may be used in inter prediction of the current picture and that may be used in inter prediction of one or more of the pictures following the current picture in decoding order. RefPicSetStFoll and RefPicSetLtFoll consist of all reference pictures that are not used in inter prediction of the current picture but may be used in inter prediction of one or more of the pictures following the current picture in decoding order. RPS provides an “intra-coded” signaling of the DPB status, instead of an “inter-coded” signaling, mainly for improved error resilience. The RPLC process in HEVC is based on the RPS, by signaling an index to an RPS subset for each reference index; this process is simpler than the RPLC process in H.264.

HEVC的参考图像管理（包括参考图像标记和从解码图像缓冲器（DPB）中去除以及参考图像列表构造（RPLC））不同于H.264。代替基于滑动窗口加上H.264中描述的自适应存储器管理控制操作（MMCO）的参考图片标记机制，HEVC指定基于参考图片集（RPS）的参考图片管理和标记机制，因此RPLC基于RPS机制。 RPS由与图片相关联的一组参考图片组成，包括在解码顺序中在相关图片之前的所有参考图片，其可用于相关图片的帧间预测或在解码中关联图片之后的任何图片订购。参考图片集包括五个参考图片列表; RefPicSetStCurrBefore，RefPicSetStCurrAfter，RefPicSetStFoll，RefPicSetLtCurr和RefPicSetLtFoll。 RefPicSetStCurrBefore，RefPicSetStCurrAfter和RefPicSetLtCurr包含可以在当前图片的帧间预测中使用的所有参考图片，并且可以用于按照解码顺序在当前图片之后的一个或多个图片的帧间预测中。 RefPicSetStFoll和RefPicSetLtFoll由未在当前图片的帧间预测中使用的所有参考图片组成，但是可以用于按解码顺序在当前图片之后的一个或多个图片的帧间预测中。 RPS提供DPB状态的“帧内编码”信令，而不是“帧间编码”信令，主要用于改善错误恢复。 HEVC中的RPLC过程基于RPS，通过向每个参考索引的RPS子集发信号通知索引;这个过程比H.264中的RPLC过程简单。

Ultra-low delay support

HEVC specifies a sub-picture-level HRD operation, for support of the so-called ultra-low delay. The mechanism specifies a standard- compliant way to enable delay reduction below a one-picture interval. Coded Picture Buffer (CPB) and DPB parameters at the sub-picture level may be signaled, and utilization of this information for the derivation of CPB timing (wherein the CPB removal time corresponds to decoding time) and DPB output timing (display time) is specified. Decoders are allowed to operate the HRD at the conventional access- unit level, even when the sub-picture-level HRD parameters are present.

HEVC指定子图像级HRD操作，以支持所谓的超低延迟。 该机制规定了一种符合标准的方法，可以在一个图像间隔内实现延迟降低。 可以用信号通知子图像级别的编码图像缓冲器（CPB）和DPB参数，并且利用该信息来推导CPB定时（其中CPB移除时间对应于解码时间）和DPB输出定时（显示时间）是指定。 即使存在子图像级HRD参数，也允许解码器在常规接入单元级别操作HRD。

New SEI messages

HEVC inherits many H.264 SEI messages with changes in syntax and/or semantics making them applicable to HEVC. Additionally, there are a few new SEI messages reviewed briefly in the following paragraphs.
The display orientation SEI message informs the decoder of a transformation that is recommended to be applied to the cropped decoded picture prior to display, such that the pictures can be properly displayed, e.g., in an upside-up manner.
The structure of pictures SEI message provides information on the NAL unit types, picture-order count values, and prediction dependencies of a sequence of pictures. The SEI message can be used, for example, for concluding what impact a lost picture has on other pictures.
The decoded picture hash SEI message provides a checksum derived from the sample values of a decoded picture. It can be used for detecting whether a picture was correctly received and decoded.
The active parameter sets SEI message includes the IDs of the active video parameter set and the active sequence parameter set and can be used to activate VPSs and SPSs. In addition, the SEI message includes the following indications: 1) An indication of whether “full random accessibility” is supported (when supported, all parameter sets needed for decoding of the remaining of the bitstream when random accessing from the beginning of the current CVS by completely discarding all access units earlier in decoding order are present in the remaining bitstream, and all coded pictures in the remaining bitstream can be correctly decoded); 2) An indication of whether there is no parameter set within the current CVS that updates another parameter set of the same type preceding in decoding order. An update of a parameter set refers to the use of the same parameter set ID but with some other parameters changed. If this property is true for all CVSs in the bitstream, then all parameter sets can be sent out-of-band before session start.
The decoding unit information SEI message provides information regarding coded picture buffer removal delay for a decoding unit. The message can be used in very-low-delay buffering operations.
The region refresh information SEI message can be used together with the recovery point SEI message (present in both H.264 and HEVC) for improved support of gradual decoding refresh. This supports random access from inter-coded pictures, wherein complete pictures can be correctly decoded or recovered after an indicated number of pictures in output/display order.

HEVC继承了许多H.264 SEI消息，其语法和/或语义发生了变化，使其适用于HEVC。此外，以下段落中简要回顾了一些新的SEI消息。

1.1.3. Parallel Processing Support

The reportedly significantly higher encoding computational demand of HEVC over H.264, in conjunction with the ever-increasing video resolution (both spatially and temporally) required by the market, led to the adoption of VCL coding tools specifically targeted to allow for parallelization on the sub-picture level. That is, parallelization occurs, at the minimum, at the granularity of an integer number of CTUs. The targets for this type of high-level parallelization are multicore CPUs and DSPs as well as multiprocessor systems. In a system design, to be useful, these tools require signaling support, which is provided in Section 7 of this memo. This section provides a brief overview of the tools available in [HEVC].
Many of the tools incorporated in HEVC were designed keeping in mind the potential parallel implementations in multicore/multiprocessor architectures. Specifically, for parallelization, four picture partition strategies, as described below, are available.
Slices are segments of the bitstream that can be reconstructed independently from other slices within the same picture (though there may still be interdependencies through loop filtering operations). Slices are the only tool that can be used for parallelization that is also available, in virtually identical form, in H.264. Parallelization based on slices does not require much inter-processor or inter-core communication (except for inter-processor or inter-core data sharing for motion compensation when decoding a predictively coded picture, which is typically much heavier than inter-processor or inter-core data sharing due to in-picture prediction), as slices are designed to be independently decodable. However, for the same reason, slices can require some coding overhead. Further, slices (in contrast to some of the other tools mentioned below) also serve as the key mechanism for bitstream partitioning to match Maximum Transfer Unit (MTU) size requirements, due to the in-picture independence of slices and the fact that each regular slice is encapsulated in its own NAL unit. In many cases, the goal of parallelization and the goal of MTU size matching can place contradicting demands to the slice layout in a picture. The realization of this situation led to the development of the more advanced tools mentioned below.
Dependent slice segments allow for fragmentation of a coded slice into fragments at CTU boundaries without breaking any in-picture prediction mechanisms. They are complementary to the fragmentation mechanism described in this memo in that they need the cooperation of the encoder. As a dependent slice segment necessarily contains an integer number of CTUs, a decoder using multiple cores operating on CTUs can process a dependent slice segment without communicating parts of the slice segment’s bitstream to other cores. Fragmentation, as specified in this memo, in contrast, does not guarantee that a fragment contains an integer number of CTUs.
In Wavefront Parallel Processing (WPP), the picture is partitioned into rows of CTUs. Entropy decoding and prediction are allowed to use data from CTUs in other partitions. Parallel processing is possible through parallel decoding of CTU rows, where the start of the decoding of a row is delayed by two CTUs, so to ensure that data related to a CTU above and to the right of the subject CTU is available before the subject CTU is being decoded. Using this staggered start (which appears like a wavefront when represented graphically), parallelization is possible with up to as many processors/cores as the picture contains CTU rows.
Because in-picture prediction between neighboring CTU rows within a picture is allowed, the required inter-processor/inter-core communication to enable in-picture prediction can be substantial. The WPP partitioning does not result in the creation of more NAL units compared to when it is not applied; thus, WPP cannot be used for MTU size matching, though slices can be used in combination for that purpose.
Tiles define horizontal and vertical boundaries that partition a picture into tile columns and rows. The scan order of CTUs is changed to be local within a tile (in the order of a CTU raster scan of a tile), before decoding the top-left CTU of the next tile in the order of tile raster scan of a picture. Similar to slices, tiles break in-picture prediction dependencies (including entropy decoding dependencies). However, they do not need to be included into individual NAL units (same as WPP in this regard); hence, tiles cannot be used for MTU size matching, though slices can be used in combination for that purpose. Each tile can be processed by one processor/core, and the inter-processor/inter-core communication required for in-picture prediction between processing units decoding neighboring tiles is limited to conveying the shared slice header in cases a slice is spanning more than one tile, and loop-filtering- related sharing of reconstructed samples and metadata. Insofar, tiles are less demanding in terms of inter-processor communication bandwidth compared to WPP due to the in-picture independence between two neighboring partitions.

HEVC中包含的许多工具都是在设计时考虑到多核/多处理器架构中潜在的并行实现。具体地，对于并行化，可以使用如下所述的四种图像分区策略。

Tiles定义水平和垂直边界，将图片分割为tile列和行。在按照图片的图块光栅扫描的顺序解码下一个图块的左上角CTU之前，CTU的扫描顺序被改变为在图块内是局部的（按照图块的CTU光栅扫描的顺序）。与切片类似，切片破坏了图片内预测依赖性（包括熵解码依赖性）。但是，它们不需要包含在单独的NAL单元中（在这方面与WPP相同）;因此，瓦片不能用于MTU大小匹配，尽管可以为此目的组合使用切片。每个瓦片可以由一个处理器/核心处理，并且在解码相邻瓦片的处理单元之间的图片内预测所需的处理器间/核心间通信限于在切片跨越多于一个的情况下传送共享切片报头。瓦片，以及与重构样本和元数据相关的循环过滤相关共享。

HEVC maintains the NAL unit concept of H.264 with modifications. HEVC uses a two-byte NAL unit header, as shown in Figure 1. The payload of a NAL unit refers to the NAL unit excluding the NAL unit header.

HEVC通过修改维护H.264的NAL单元概念。 HEVC使用双字节NAL单元头，如图1所示.NAL单元的有效载荷是指除NAL单元头之外的NAL单元。

+---------------+---------------+
|0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|   Type    |  LayerId  | TID |
+-------------+-----------------+
Figure 1: The Structure of the HEVC NAL Unit Header


The semantics of the fields in the NAL unit header are as specified in [HEVC] and described briefly below for convenience. In addition to the name and size of each field, the corresponding syntax element name in [HEVC] is also provided.
F: 1 bit forbidden_zero_bit. Required to be zero in [HEVC]. Note that the inclusion of this bit in the NAL unit header was to enable transport of HEVC video over MPEG-2 transport systems (avoidance of start code emulations) [MPEG2S]. In the context of this memo,the value 1 may be used to indicate a syntax violation, e.g., for a NAL unit resulted from aggregating a number of fragmented units of a NAL unit but missing the last fragment, as described in Section 4.4.3.
Type: 6 bits nal_unit_type. This field specifies the NAL unit type as defined in Table 7-1 of [HEVC]. If the most significant bit of this field of a NAL unit is equal to 0 (i.e., the value of this field is less than 32), the NAL unit is a VCL NAL unit. Otherwise, the NAL unit is a non-VCL NAL unit. For a reference of all currently defined NAL unit types and their semantics, please refer to Section 7.4.2 in [HEVC].
LayerId: 6 bits nuh_layer_id. Required to be equal to zero in [HEVC]. It is anticipated that in future scalable or 3D video coding extensions of this specification, this syntax element will be used to identify additional layers that may be present in the CVS, wherein a layer may be, e.g., a spatial scalable layer, a quality scalable layer, a texture view, or a depth view.
TID: 3 bits nuh_temporal_id_plus1. This field specifies the temporal identifier of the NAL unit plus 1. The value of TemporalId is equal to TID minus 1. A TID value of 0 is illegal to ensure that there is at least one bit in the NAL unit header equal to 1, so to enable independent considerations of start code emulations in the NAL unit header and in the NAL unit payload data.

NAL单元头中的字段的语义如[HEVC]中所规定，并且为了方便起见在下面简要描述。除了每个字段的名称和大小之外，还提供了[HEVC]中的相应语法元素名称。
F：1位forbidden_​​zero_bit。在[HEVC]中要求为零。注意，在NAL单元头中包含该比特是为了能够通过MPEG-2传输系统传输HEVC视频（避免启动代码仿真）[MPEG2S]。在本备忘录的上下文中，值1可用于指示语法违规，例如，对于NAL单元，其是由聚合NAL单元的多个分段单元但缺少最后一个片段而产生的，如第4.4.3节中所述。

LayerId：6比特nuh_layer_id。在[HEVC]中要求等于零。预期在本说明书的未来可缩放或3D视频编码扩展中，此语法元素将用于识别可存在于CVS中的附加层，其中层可为（例如，空间可缩放层），质量可缩放的图层，纹理视图或深度视图。
TID：3位nuh_temporal_id_plus1。该字段指定NAL单元的时间标识符加1.TemporalId的值等于TID减1.TID值为0是非法的，以确保NAL单元头中至少有一位等于1，所以在NAL单元报头和NAL单元有效载荷数据中启用独立的起始码仿真考虑。

1.2. Overview of the Payload Format

This payload format defines the following processes required for transport of HEVC coded data over RTP [RFC3550]:
o Packetization of HEVC coded NAL units into RTP packets using three types of payload structures: a single NAL unit packet, aggregation packet, and fragment unit
o Transmission of HEVC NAL units of the same bitstream within a single RTP stream or multiple RTP streams (within one or more RTP sessions), where within an RTP stream transmission of NAL units may be either non-interleaved (i.e., the transmission order of NAL units is the same as their decoding order) or interleaved (i.e., the transmission order of NAL units is different from the decoding order)
o Media type parameters to be used with the Session Description Protocol (SDP) [RFC4566]
o A payload header extension mechanism and data structures for enhanced support of temporal scalability based on that extension mechanism.

o使用具有此有效载荷格式的RTP头
o使用三种类型的有效载荷结构将HEVC编码的NAL单元分组为RTP分组：单个NAL单元分组，聚合分组和分段单元
o在单个RTP流或多个RTP流（在一个或多个RTP会话内）内的相同比特流的HEVC NAL单元的传输，其中在NTP单元内的RTP流传输可以是非交织的（即，传输顺序为 NAL单元与其解码顺序相同）或交织（即，NAL单元的传输顺序与解码顺序不同）
o与会话描述协议（SDP）[RFC4566]一起使用的媒体类型参数
o有效负载头扩展机制和数据结构，用于基于该扩展机制增强对时间可伸缩性的支持。

2 Conventions

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in BCP 14 [RFC2119].
In this document, the above key words will convey that interpretation only when in ALL CAPS. Lowercase uses of these words are not to be interpreted as carrying the significance described in RFC 2119.
This specification uses the notion of setting and clearing a bit when bit fields are handled. Setting a bit is the same as assigning that bit the value of 1 (On). Clearing a bit is the same as assigning that bit the value of 0 (Off).

3 Definitions and Abbreviations

3.1. Definitions
This document uses the terms and definitions of [HEVC]. Section 3.1.1 lists relevant definitions from [HEVC] for convenience. Section 3.1.2 provides definitions specific to this memo.

3.1.1. Definitions from the HEVC Specification

access unit: A set of NAL units that are associated with each other according to a specified classification rule, that are consecutive in decoding order, and that contain exactly one coded picture.
BLA access unit: An access unit in which the coded picture is a BLA picture.
BLA picture: An IRAP picture for which each VCL NAL unit has nal_unit_type equal to BLA_W_LP, BLA_W_RADL, or BLA_N_LP.
Coded Video Sequence (CVS): A sequence of access units that consists, in decoding order, of an IRAP access unit with NoRaslOutputFlag equal to 1, followed by zero or more access units that are not IRAP access units with NoRaslOutputFlag equal to 1, including all subsequent access units up to but not including any subsequent access unit that is an IRAP access unit with NoRaslOutputFlag equal to 1.
Informative note: An IRAP access unit may be an IDR access unit, a BLA access unit, or a CRA access unit. The value of NoRaslOutputFlag is equal to 1 for each IDR access unit, each BLA access unit, and each CRA access unit that is the first access unit in the bitstream in decoding order, is the first access unit that follows an end of sequence NAL unit in decoding order, or has HandleCraAsBlaFlag equal to 1.
CRA access unit: An access unit in which the coded picture is a CRA picture.
CRA picture: A RAP picture for which each VCL NAL unit has nal_unit_type equal to CRA_NUT.
IDR access unit: An access unit in which the coded picture is an IDR picture.
IDR picture: A RAP picture for which each VCL NAL unit has nal_unit_type equal to IDR_W_RADL or IDR_N_LP.
IRAP access unit: An access unit in which the coded picture is an IRAP picture.
IRAP picture: A coded picture for which each VCL NAL unit has nal_unit_type in the range of BLA_W_LP (16) to RSV_IRAP_VCL23 (23), inclusive.
layer: A set of VCL NAL units that all have a particular value of nuh_layer_id and the associated non-VCL NAL units, or one of a set of syntactical structures having a hierarchical relationship.
operation point: bitstream created from another bitstream by operation of the sub-bitstream extraction process with the another bitstream, a target highest TemporalId, and a target-layer identifier list as input.
random access: The act of starting the decoding process for a bitstream at a point other than the beginning of the bitstream.
sub-layer: A temporal scalable layer of a temporal scalable bitstream consisting of VCL NAL units with a particular value of the TemporalId variable, and the associated non-VCL NAL units.
sub-layer representation: A subset of the bitstream consisting of NAL units of a particular sub-layer and the lower sub-layers.
tile: A rectangular region of coding tree blocks within a particular tile column and a particular tile row in a picture.
tile column: A rectangular region of coding tree blocks having a height equal to the height of the picture and a width specified by syntax elements in the picture parameter set.
tile row: A rectangular region of coding tree blocks having a height specified by syntax elements in the picture parameter set and a width equal to the width of the picture.

BLA访问单元：一种访问单元，其中编码图像是BLA图像。

CRA访问单元：一种访问单元，其中编码图像是CRA图像。
CRA图片：每个VCL NAL单元具有等于CRA_NUT的nal_unit_type的RAP图片。
IDR访问单元：一种访问单元，其中编码图像是IDR图像。
IRAP访问单元：一种访问单元，其中编码图像是IRAP图像。
IRAP图片：每个VCL NAL单元在BLA_W_LP（16）到RSV_IRAP_VCL23（23）的范围内具有nal_unit_type的编码图片。
layer：一组VCL NAL单元，它们都具有特定的nuh_layer_id值和相关的非VCL NAL单元，或者是具有层次关系的一组语法结构中的一个。

tile：特定tile列中的编码树块的矩形区域和图片中的特定tile行。
tile列：编码树块的矩形区域，其高度等于图片的高度，以及由图片参数集中的语法元素指定的宽度。

3.1.2. Definitions Specific to This Memo

dependee RTP stream: An RTP stream on which another RTP stream depends. All RTP streams in a Multiple RTP streams on a Single media Transport (MRST) or Multiple RTP streams on Multiple media Transports (MRMT), except for the highest RTP stream, are dependee RTP streams.
highest RTP stream: The RTP stream on which no other RTP stream depends. The RTP stream in a Single RTP stream on a Single media Transport (SRST) is the highest RTP stream.
Media-Aware Network Element (MANE): A network element, such as a middlebox, selective forwarding unit, or application-layer gateway that is capable of parsing certain aspects of the RTP payload headers or the RTP payload and reacting to their contents.
Informative note: The concept of a MANE goes beyond normal routers or gateways in that a MANE has to be aware of the signaling (e.g., to learn about the payload type mappings of the media streams), and in that it has to be trusted when working with Secure RTP (SRTP). The advantage of using MANEs is that they allow packets to be dropped according to the needs of the media coding. For example, if a MANE has to drop packets due to congestion on a certain link, it can identify and remove those packets whose elimination produces the least adverse effect on the user experience. After dropping packets, MANEs must rewrite RTCP packets to match the changes to the RTP stream, as specified in Section 7 of [RFC3550].
Media Transport: As used in the MRST, MRMT, and SRST definitions below, Media Transport denotes the transport of packets over a transport association identified by a 5-tuple (source address, source port, destination address, destination port, transport protocol). See also Section 2.1.13 of [RFC7656].
Informative note: The term “bitstream” in this document is equivalent to the term “encoded stream” in [RFC7656].

dependee RTP stream：另一个RTP流所依赖的RTP流。单个媒体传输（MRST）上的多个RTP流中的所有RTP流或多个媒体传输（MRMT）上的多个RTP流（除了最高RTP流之外）是依赖性RTP流。

Multiple RTP streams on a Single media Transport (MRST): Multiple RTP streams carrying a single HEVC bitstream on a Single Transport. See also Section 3.5 of [RFC7656].
Multiple RTP streams on Multiple media Transports (MRMT): Multiple RTP streams carrying a single HEVC bitstream on Multiple Transports. See also Section 3.5 of [RFC7656].
NAL unit decoding order: A NAL unit order that conforms to the constraints on NAL unit order given in Section 7.4.2.4 in [HEVC].
NAL unit output order: A NAL unit order in which NAL units of different access units are in the output order of the decoded pictures corresponding to the access units, as specified in [HEVC], and in which NAL units within an access unit are in their decoding order.
NAL-unit-like structure: A data structure that is similar to NAL units in the sense that it also has a NAL unit header and a payload, with a difference that the payload does not follow the start code emulation prevention mechanism required for the NAL unit syntax as specified in Section 7.3.1.1 of [HEVC]. Examples of NAL-unit-like structures defined in this memo are packet payloads of Aggregation Packet (AP), PAyload Content Information (PACI), and Fragmentation Unit (FU) packets.
NALU-time: The value that the RTP timestamp would have if the NAL unit would be transported in its own RTP packet.
RTP stream: See [RFC7656]. Within the scope of this memo, one RTP stream is utilized to transport one or more temporal sub-layers.
Single RTP stream on a Single media Transport (SRST): Single RTP stream carrying a single HEVC bitstream on a Single (Media) Transport. See also Section 3.5 of [RFC7656].
transmission order: The order of packets in ascending RTP sequence number order (in modulo arithmetic). Within an aggregation packet, the NAL unit transmission order is the same as the order of appearance of NAL units in the packet.

NAL单元解码顺序：符合[HEVC]第7.4.2.4节中给出的NAL单元顺序约束的NAL单元顺序。
NAL单元输出顺序：NAL单元顺序，其中不同访问单元的NAL单元处于对应于访问单元的解码图像的输出顺序中，如[HEVC]中所规定的，并且其中访问单元内的NAL单元处于他们的解码顺序。

NALU-time：如果NAL单元将在其自己的RTP数据包中传输，则RTP时间戳将具有的值。
RTP流：参见[RFC7656]。在该备忘录的范围内，利用一个RTP流来传输一个或多个时间子层。

3.2. Abbreviations

AP Aggregation Packet
CRA Clean Random Access
CTB Coding Tree Block
CTU Coding Tree Unit
CVS Coded Video Sequence
DPH Decoded Picture Hash
FU Fragmentation Unit
HRD Hypothetical Reference Decoder
IDR Instantaneous Decoding Refresh
IRAP Intra Random Access Point
MANE Media-Aware Network Element
MRMT Multiple RTP streams on Multiple media Transports
MRST Multiple RTP streams on a Single media Transport
MTU Maximum Transfer Unit
NAL Network Abstraction Layer
NALU Network Abstraction Layer Unit
PPS Picture Parameter Set
RASL Random Access Skipped Leading (Picture)
RPS Reference Picture Set
SEI Supplemental Enhancement Information
SPS Sequence Parameter Set
SRST Single RTP stream on a Single media Transport
STSA Step-wise Temporal Sub-layer Access
TSA Temporal Sub-layer Access
TSCI Temporal Scalability Control Information
VCL Video Coding Layer
VPS Video Parameter Set

The format of the RTP header is specified in [RFC3550] (reprinted as Figure 2 for convenience). This payload format uses the fields of the header in a manner consistent with that specification.
The RTP payload (and the settings for some RTP header bits) for aggregation packets and fragmentation units are specified in Sections 4.4.2 and 4.4.3, respectively.

RTP标头的格式在[RFC3550]中指定（为方便起见，重新打印为图2）。 此有效内容格式以与该规范一致的方式使用标头的字段。

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X|  CC   |M|     PT      |       sequence number         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           timestamp                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           synchronization source (SSRC) identifier            |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|            contributing source (CSRC) identifiers             |
|                             ....                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 2: RTP Header According to [RFC3550]


The RTP header information to be set according to this RTP payload format is set as follows:

Marker bit (M): 1 bit

Set for the last packet of the access unit, carried in the current RTP stream. This is in line with the normal use of the M bit in video formats to allow an efficient playout buffer handling. When MRST or MRMT is in use, if an access unit appears in multiple RTP streams, the marker bit is set on each RTP stream’s last packet of the access unit.
Informative note: The content of a NAL unit does not tell whether or not the NAL unit is the last NAL unit, in decoding order, of an access unit. An RTP sender implementation may obtain this information from the video encoder. If, however, the implementation cannot obtain this information directly from the encoder, e.g., when the bitstream was pre-encoded, and also there is no timestamp allocated for each NAL unit, then the sender implementation can inspect subsequent NAL units in decoding order to determine whether or not the NAL unit is the last NAL unit of an access unit as follows. A NAL unit is determined to be the last NAL unit of an access unit if it is the last NAL unit of the bitstream. A NAL unit naluX is also determined to be the last NAL unit of an access unit if both the following conditions are true: 1) the next VCL NAL unit naluY in decoding order has the high-order bit of the first byte after its NAL unit header equal to 1, and 2) all NAL units between naluX and naluY, when present, have nal_unit_type in the range of 32 to 35, inclusive, equal to 39, or in the ranges of 41 to 44, inclusive, or 48 to 55, inclusive.

The assignment of an RTP payload type for this new packet format is outside the scope of this document and will not be specified here. The assignment of a payload type has to be performed either through the profile used or in a dynamic way.

Informative note: It is not required to use different payload type values for different RTP streams in MRST or MRMT.
Sequence Number (SN): 16 bits
Set and used in accordance with [RFC3550].

Timestamp: 32 bits
The RTP timestamp is set to the sampling timestamp of the content. A 90 kHz clock rate MUST be used.
If the NAL unit has no timing properties of its own (e.g., parameter set and SEI NAL units), the RTP timestamp MUST be set to the RTP timestamp of the coded picture of the access unit in which the NAL unit (according to Section 7.4.2.4.4 of [HEVC]) is included.
Receivers MUST use the RTP timestamp for the display process, even when the bitstream contains picture timing SEI messages or decoding unit information SEI messages as specified in [HEVC]. However, this does not mean that picture timing SEI messages in the bitstream should be discarded, as picture timing SEI messages may contain frame-field information that is important in appropriately rendering interlaced video.

RTP时间戳设置为内容的采样时间戳。 必须使用90 kHz时钟速率。

Synchronization source (SSRC): 32 bits
Used to identify the source of the RTP packets. When using SRST, by definition a single SSRC is used for all parts of a single bitstream. In MRST or MRMT, different SSRCs are used for each RTP stream containing a subset of the sub-layers of the single (temporally scalable) bitstream. A receiver is required to correctly associate the set of SSRCs that are included parts of the same bitstream.

The first two bytes of the payload of an RTP packet are referred to as the payload header. The payload header consists of the same fields (F, Type, LayerId, and TID) as the NAL unit header as shown in Section 1.1.4, irrespective of the type of the payload structure.
The TID value indicates (among other things) the relative importance of an RTP packet, for example, because NAL units belonging to higher temporal sub-layers are not used for the decoding of lower temporal sub-layers. A lower value of TID indicates a higher importance. More-important NAL units MAY be better protected against transmission losses than less-important NAL units.

RTP分组的有效载荷的前两个字节称为有效载荷报头。 无论有效载荷结构的类型如何，有效载荷头部都包含与第1.1.4节所示的NAL单元头相同的字段（F，Type，LayerId和TID）。
TID值指示（其中）RTP分组的相对重要性，例如，因为属于较高时间子层的NAL单元不用于较低时间子层的解码。 较低的TID值表示较高的重要性。 与不太重要的NAL单元相比，更重要的NAL单元可以更好地防止传输损失。

4.3. Transmission Modes

This memo enables transmission of an HEVC bitstream over:
o a Single RTP stream on a Single media Transport (SRST),
o Multiple RTP streams over a Single media Transport (MRST), or
o Multiple RTP streams on Multiple media Transports (MRMT).
Informative note: While this specification enables the use of MRST within the H.265 RTP payload, the signaling of MRST within SDP offer/answer is not fully specified at the time of this writing. See [RFC5576] and [RFC5583] for what is supported today as well as [RTP-MULTI-STREAM] and [SDP-NEG] for future directions.
When in MRMT, the dependency of one RTP stream on another RTP stream is typically indicated as specified in [RFC5583]. [RFC5583] can also be utilized to specify dependencies within MRST, but only if the RTP streams utilize distinct payload types.
SRST or MRST SHOULD be used for point-to-point unicast scenarios, whereas MRMT SHOULD be used for point-to-multipoint multicast scenarios where different receivers require different operation points of the same HEVC bitstream, to improve bandwidth utilizing efficiency.
Informative note: A multicast may degrade to a unicast after all but one receivers have left (this is a justification of the first “SHOULD” instead of “MUST”), and there might be scenarios where MRMT is desirable but not possible, e.g., when IP multicast is not deployed in certain network (this is a justification of the second “SHOULD” instead of “MUST”).
The transmission mode is indicated by the tx-mode media parameter (see Section 7.1). If tx-mode is equal to “SRST”, SRST MUST be used. Otherwise, if tx-mode is equal to “MRST”, MRST MUST be used. Otherwise (tx-mode is equal to “MRMT”), MRMT MUST be used.
Informative note: When an RTP stream does not depend on other RTP streams, any of SRST, MRST, or MRMT may be in use for the RTP stream.
Receivers MUST support all of SRST, MRST, and MRMT.
Informative note: The required support of MRMT by receivers does not imply that multicast must be supported by receivers.

o单媒体传输（SRST）上的单个RTP流，
o单个媒体传输（MRST）上的多个RTP流，或
o多媒体传输（MRMT）上的多个RTP流。

SRST或MRST应该用于点对点单播场景，而MRMT应该用于点到多点多播场景，其中不同的接收器需要相同HEVC比特流的不同操作点，以提高带宽利用效率。

Four different types of RTP packet payload structures are specified. A receiver can identify the type of an RTP packet payload through the Type field in the payload header.
The four different payload structures are as follows:
o Single NAL unit packet: Contains a single NAL unit in the payload, and the NAL unit header of the NAL unit also serves as the payload header. This payload structure is specified in Section 4.4.1.
o Aggregation Packet (AP): Contains more than one NAL unit within one access unit. This payload structure is specified in Section 4.4.2.
o Fragmentation Unit (FU): Contains a subset of a single NAL unit. This payload structure is specified in Section 4.4.3.

o单个NAL单元包：在有效载荷中包含单个NAL单元，并且NAL单元的NAL单元头也用作有效载荷头。 此有效负载结构在第4.4.1节中指定。
o聚合数据包（AP）：在一个访问单元中包含多个NAL单元。 该有效载荷结构在4.4.2节中规定。
o分段单元（FU）：包含单个NAL单元的子集。 该有效载荷结构在4.4.3节中规定。
o承载RTP数据包的PACI：包含有效负载标头（与效率的其他有效负载标头不同），有效负载标头扩展结构（PHES）和PACI有效负载。 该有效载荷结构在4.4.4节中规定。

4.4.1. Single NAL Unit Packets

A single NAL unit packet contains exactly one NAL unit, and consists of a payload header (denoted as PayloadHdr), a conditional 16-bit DONL field (in network byte order), and the NAL unit payload data (the NAL unit excluding its NAL unit header) of the contained NAL unit, as shown in Figure 3.

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           PayloadHdr          |      DONL (conditional)       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               |
|                  NAL unit payload data                        |
|                                                               |
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 3: The Structure of a Single NAL Unit Packet


The payload header SHOULD be an exact copy of the NAL unit header of the contained NAL unit. However, the Type (i.e., nal_unit_type) field MAY be changed, e.g., when it is desirable to handle a CRA picture to be a BLA picture [JCTVC-J0107].
The DONL field, when present, specifies the value of the 16 least significant bits of the decoding order number of the contained NAL unit. If sprop-max-don-diff is greater than 0 for any of the RTP streams, the DONL field MUST be present, and the variable DON for the contained NAL unit is derived as equal to the value of the DONL field. Otherwise (sprop-max-don-diff is equal to 0 for all the RTP streams), the DONL field MUST NOT be present.

DONL字段（如果存在）指定所包含的NAL单元的解码顺序号的16个最低有效位的值。 如果任何RTP流的sprop-max-don-diff大于0，则必须存在DONL字段，并且所包含的NAL单元的变量DON等于DONL字段的值。 否则（对于所有RTP流，sprop-max-don-diff等于0），DONL字段不得出现。

4.4.2. Aggregation Packets (APs)

Aggregation Packets (APs) are introduced to enable the reduction of packetization overhead for small NAL units, such as most of the non- VCL NAL units, which are often only a few octets in size.
An AP aggregates NAL units within one access unit. Each NAL unit to be carried in an AP is encapsulated in an aggregation unit. NAL units aggregated in one AP are in NAL unit decoding order.
An AP consists of a payload header (denoted as PayloadHdr) followed by two or more aggregation units, as shown in Figure 4.

AP聚合一个访问单元内的NAL单元。 要在AP中携带的每个NAL单元封装在聚合单元中。 在一个AP中聚合的NAL单元处于NAL单元解码顺序。

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
|                                                               |
|             two or more aggregation units                     |
|                                                               |
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 4: The Structure of an Aggregation Packet


The fields in the payload header are set as follows. The F bit MUST be equal to 0 if the F bit of each aggregated NAL unit is equal to zero; otherwise, it MUST be equal to 1. The Type field MUST be equal to 48. The value of LayerId MUST be equal to the lowest value of LayerId of all the aggregated NAL units. The value of TID MUST be the lowest value of TID of all the aggregated NAL units.

Informative note: All VCL NAL units in an AP have the same TID value since they belong to the same access unit. However, an AP may contain non-VCL NAL units for which the TID value in the NAL unit header may be different than the TID value of the VCL NAL units in the same AP.
An AP MUST carry at least two aggregation units and can carry as many aggregation units as necessary; however, the total amount of data in an AP obviously MUST fit into an IP packet, and the size SHOULD be chosen so that the resulting IP packet is smaller than the MTU size so to avoid IP layer fragmentation. An AP MUST NOT contain FUs specified in Section 4.4.3. APs MUST NOT be nested; i.e., an AP must not contain another AP.
The first aggregation unit in an AP consists of a conditional 16-bit DONL field (in network byte order) followed by a 16-bit unsigned size information (in network byte order) that indicates the size of the NAL unit in bytes (excluding these two octets, but including the NAL unit header), followed by the NAL unit itself, including its NAL unit header, as shown in Figure 5.

AP必须至少携带两个聚合单元，并且可以根据需要携带尽可能多的聚合单元;但是，AP中的数据总量显然必须适合IP数据包，并且应该选择大小以使得到的IP数据包小于MTU大小，以避免IP层碎片。 AP不得包含第4.4.3节中指定的FU。 AP不能嵌套;即，AP不得包含另一个AP。
AP中的第一个聚合单元由条件16位DONL字段（网络字节顺序）和16位无符号大小信息（按网络字节顺序）组成，表示NAL单元的大小（以字节为单位）（不包括这些）两个八位字节，但包括NAL单元标题），然后是NAL单元本身，包括其NAL单元标题，如图5所示。

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
:       DONL (conditional)      |   NALU size   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|   NALU size   |                                               |
+-+-+-+-+-+-+-+-+         NAL unit                              |
|                                                               |
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 5: The Structure of the First Aggregation Unit in an AP


The DONL field, when present, specifies the value of the 16 least significant bits of the decoding order number of the aggregated NAL unit.
If sprop-max-don-diff is greater than 0 for any of the RTP streams, the DONL field MUST be present in an aggregation unit that is the first aggregation unit in an AP, and the variable DON for the aggregated NAL unit is derived as equal to the value of the DONL field. Otherwise (sprop-max-don-diff is equal to 0 for all the RTP streams), the DONL field MUST NOT be present in an aggregation unit that is the first aggregation unit in an AP.

An aggregation unit that is not the first aggregation unit in an AP consists of a conditional 8-bit DOND field followed by a 16-bit unsigned size information (in network byte order) that indicates the size of the NAL unit in bytes (excluding these two octets, but including the NAL unit header), followed by the NAL unit itself, including its NAL unit header, as shown in Figure 6.

DONL字段（当存在时）指定聚合的NAL单元的解码顺序号的16个最低有效位的值。

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
: DOND (cond)   |          NALU size            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               |
|                       NAL unit                                |
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


Figure 6: The Structure of an Aggregation Unit That Is Not the First Aggregation Unit in an AP

Aggregation Unit in an AP
When present, the DOND field plus 1 specifies the difference between the decoding order number values of the current aggregated NAL unit and the preceding aggregated NAL unit in the same AP.
If sprop-max-don-diff is greater than 0 for any of the RTP streams, the DOND field MUST be present in an aggregation unit that is not the first aggregation unit in an AP, and the variable DON for the aggregated NAL unit is derived as equal to the DON of the preceding aggregated NAL unit in the same AP plus the value of the DOND field plus 1 modulo 65536. Otherwise (sprop-max-don-diff is equal to 0 for all the RTP streams), the DOND field MUST NOT be present in an aggregation unit that is not the first aggregation unit in an AP, and in this case the transmission order and decoding order of NAL units carried in the AP are the same as the order the NAL units appear in the AP.
Figure 7 presents an example of an AP that contains two aggregation units, labeled as 1 and 2 in the figure, without the DONL and DOND fields being present.

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|   PayloadHdr (Type=48)        |         NALU 1 Size           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          NALU 1 HDR           |                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+         NALU 1 Data           |
|                   . . .                                       |
|                                                               |
+               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  . . .        | NALU 2 Size                   | NALU 2 HDR    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| NALU 2 HDR    |                                               |
+-+-+-+-+-+-+-+-+              NALU 2 Data                      |
|                   . . .                                       |
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 7: An Example of an AP Packet Containing Two Aggregation
Units without the DONL and DOND Fields


Figure 8 presents an example of an AP that contains two aggregation units, labeled as 1 and 2 in the figure, with the DONL and DOND fields being present.

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|   PayloadHdr (Type=48)        |        NALU 1 DONL            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          NALU 1 Size          |            NALU 1 HDR         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               |
|                 NALU 1 Data   . . .                           |
|                                                               |
+     . . .     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|               |  NALU 2 DOND  |          NALU 2 Size          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          NALU 2 HDR           |                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+          NALU 2 Data          |
|                                                               |
|        . . .                  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 8: An Example of an AP Containing Two Aggregation Units
with the DONL and DOND Fields


4.4.3. Fragmentation Units

Fragmentation Units (FUs) are introduced to enable fragmenting a single NAL unit into multiple RTP packets, possibly without cooperation or knowledge of the HEVC encoder. A fragment of a NAL unit consists of an integer number of consecutive octets of that NAL unit. Fragments of the same NAL unit MUST be sent in consecutive order with ascending RTP sequence numbers (with no other RTP packets within the same RTP stream being sent between the first and last fragment).
When a NAL unit is fragmented and conveyed within FUs, it is referred to as a fragmented NAL unit. APs MUST NOT be fragmented. FUs MUST NOT be nested; i.e., an FU must not contain a subset of another FU.
The RTP timestamp of an RTP packet carrying an FU is set to the NALU- time of the fragmented NAL unit.

An FU consists of a payload header (denoted as PayloadHdr), an FU header of one octet, a conditional 16-bit DONL field (in network byte order), and an FU payload, as shown in Figure 9.

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
| DONL (cond)   |                                               |
|-+-+-+-+-+-+-+-+                                               |
|                                                               |
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 9: The Structure of an FU


The fields in the payload header are set as follows. The Type field MUST be equal to 49. The fields F, LayerId, and TID MUST be equal to the fields F, LayerId, and TID, respectively, of the fragmented NAL unit.
The FU header consists of an S bit, an E bit, and a 6-bit FuType field, as shown in Figure 10.

FU报头由S位，E位和6位FuType字段组成，如图10所示。

+---------------+
|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+
|S|E|  FuType   |
+---------------+
Figure 10: The Structure of FU Header


The semantics of the FU header fields are as follows:
S: 1 bit When set to 1, the S bit indicates the start of a fragmented NAL unit, i.e., the first byte of the FU payload is also the first byte of the payload of the fragmented NAL unit. When the FU payload is not the start of the fragmented NAL unit payload, the S bit MUST be set to 0.

S：1比特当设置为1时，S比特指示分段NAL单元的开始，即，FU有效载荷的第一个字节也是分段NAL单元的有效载荷的第一个字节。 当FU有效载荷不是分段NAL单元有效载荷的开始时，S比特必须设置为0。

E: 1 bit When set to 1, the E bit indicates the end of a fragmented NAL unit, i.e., the last byte of the payload is also the last byte of the fragmented NAL unit. When the FU payload is not the last fragment of a fragmented NAL unit, the E bit MUST be set to 0.

E：1比特当设置为1时，E比特指示分段的NAL单元的结束，即，有效载荷的最后一个字节也是分段的NAL单元的最后一个字节。 当FU有效载荷不是分段NAL单元的最后一个片段时，E比特必须设置为0。

FuType: 6 bits The field FuType MUST be equal to the field Type of the fragmented NAL unit.
The DONL field, when present, specifies the value of the 16 least significant bits of the decoding order number of the fragmented NAL unit.
If sprop-max-don-diff is greater than 0 for any of the RTP streams, and the S bit is equal to 1, the DONL field MUST be present in the FU, and the variable DON for the fragmented NAL unit is derived as equal to the value of the DONL field. Otherwise (sprop-max-don-diff is equal to 0 for all the RTP streams, or the S bit is equal to 0), the DONL field MUST NOT be present in the FU.
A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e., the Start bit and End bit must not both be set to 1 in the same FU header.

FuType：6比特字段FuType必须等于分段NAL单元的字段类型。
DONL字段（当存在时）指定分段NAL单元的解码顺序号的16个最低有效位的值。

If an FU is lost, the receiver SHOULD discard all following fragmentation units in transmission order corresponding to the same fragmented NAL unit, unless the decoder in the receiver is known to be prepared to gracefully handle incomplete NAL units.
A receiver in an endpoint or in a MANE MAY aggregate the first n-1 fragments of a NAL unit to an (incomplete) NAL unit, even if fragment n of that NAL unit is not received. In this case, the forbidden_zero_bit of the NAL unit MUST be set to 1 to indicate a syntax violation.

FU有效载荷由分段NAL单元的有效载荷的片段组成，使得如果连续FU的FU有效载荷从S位等于1并且以E位等于1的FU结束的FU开始，则顺序连接后，可以重建分段NAL单元的有效载荷。分段的NAL单元的NAL单元头部不包括在FU有效载荷中，而是分段的NAL单元的NAL单元头部的信息在F的有效载荷头部的F，LayerId和TID字段中传送。 FU和FU的FU报头的FuType字段。 FU有效载荷不得为空。

4.4.4. PACI Packets

This section specifies the PACI packet structure. The basic payload header specified in this memo is intentionally limited to the 16 bits of the NAL unit header so to keep the packetization overhead to a minimum. However, cases have been identified where it is advisable to include control information in an easily accessible position in the packet header, despite the additional overhead. One such control information is the TSCI as specified in Section 4.5. PACI packets carry this and future, similar structures.
The PACI packet structure is based on a payload header extension mechanism that is generic and extensible to carry payload header extensions. In this section, the focus lies on the use within this specification. Section 4.4.4.2 provides guidance for the specification designers in how to employ the extension mechanism in future specifications.
A PACI packet consists of a payload header (denoted as PayloadHdr), for which the structure follows what is described in Section 4.2. The payload header is followed by the fields A, cType, PHSsize, F[0…2], and Y.

PACI分组结构基于有效载荷报头扩展机制，该机制是通用的并且可扩展以承载有效载荷报头扩展。在本节中，重点在于本规范中的使用。第4.4.4.2节为规范设计者提供了如何在未来规范中使用扩展机制的指导。

Figure 11 shows a PACI packet in compliance with this memo, i.e., without any extensions.

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|    PayloadHdr (Type=50)       |A|   cType   | PHSsize |F0..2|Y|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=|
|                                                               |
|                  PACI payload: NAL unit                       |
|                   . . .                                       |
|                                                               |
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 11: The Structure of a PACI


The fields in the payload header are set as follows. The F bit MUST be equal to 0. The Type field MUST be equal to 50. The value of LayerId MUST be a copy of the LayerId field of the PACI payload NAL unit or NAL-unit-like structure. The value of TID MUST be a copy of the TID field of the PACI payload NAL unit or NAL-unit-like structure.

The semantics of other fields are as follows:

A: 1 bit Copy of the F bit of the PACI payload NAL unit or NAL-unit-like structure.
cType: 6 bits Copy of the Type field of the PACI payload NAL unit or NAL-unit- like structure.
PHSsize: 5 bits Indicates the length of the PHES field. The value is limited to be less than or equal to 32 octets, to simplify encoder design for MTU size matching.
F0: This field equal to 1 specifies the presence of a temporal scalability support extension in the PHES.
F1, F2: MUST be 0, available for future extensions, see Section 4.4.4.2. Receivers compliant with this version of the HEVC payload format MUST ignore F1=1 and/or F2=1, and also ignore any information in the PHES indicated as present by F1=1 and/or F2=1.
Informative note: The receiver can do that by first decoding information associated with F0=1, and then skipping over any remaining bytes of the PHES based on the value of PHSsize.
Y: 1 bit MUST be 0, available for future extensions, see Section 4.4.4.2. Receivers compliant with this version of the HEVC payload format MUST ignore Y=1, and also ignore any information in the PHES indicated as present by Y.
PHES: variable number of octets A variable number of octets as indicated by the value of PHSsize.

A：1比特复制PACI有效载荷NAL单元的F比特或类似NAL单元的结构。
cType：6比特PACI有效载荷NAL单元的类型字段或类似NAL单元的结构的副本。
PHSsize：5比特表示PHES字段的长度。该值限制为小于或等于32个八位字节，以简化MTU大小匹配的编码器设计。
F0：该字段等于1指定PHES中存在时间可伸缩性支持扩展。
F1，F2：必须为0，可用于将来的扩展，请参见第4.4.4.2节。符合此版本的HEVC有效载荷格式的接收器必须忽略F1 = 1和/或F2 = 1，并且还忽略由F1 = 1和/或F2 = 1指示为存在的PHES中的任何信息。

Y：1位必须为0，可用于将来的扩展，请参见第4.4.4.2节。符合此版本的HEVC有效载荷格式的接收器必须忽略Y = 1，并且还忽略由Y指示的PHES中的任何信息。
PHES：可变数量的八位字节由PHSsize的值指示的可变数量的八位字节。

PACI Payload: The single NAL unit packet or NAL-unit-like structure (such as: FU or AP) to be carried, not including the first two octets.
Informative note: The first two octets of the NAL unit or NALunit-like structure carried in the PACI payload are not included in the PACI payload. Rather, the respective values are copied in locations of the PayloadHdr of the RTP packet. This design offers two advantages: first, the overall structure of the payload header is preserved, i.e., there is no special case of payload header structure that needs to be implemented for PACI. Second, no additional overhead is introduced.
A PACI payload MAY be a single NAL unit, an FU, or an AP. PACIs MUST NOT be fragmented or aggregated. The following subsection documents the reasons for these design choices.

PACI有效载荷：要携带的单个NAL单元数据包或类似NAL单元的结构（例如：FU或AP），不包括前两个八位字节。

PACI有效载荷可以是单个NAL单元，FU或AP。 PACI不得分散或聚合。 以下小节介绍了这些设计选择的原因。

4.4.4.1. Reasons for the PACI Rules (Informative)

A PACI cannot be fragmented. If a PACI could be fragmented, and a fragment other than the first fragment got lost, access to the information in the PACI would not be possible. Therefore, a PACI must not be fragmented. In other words, an FU must not carry (fragments of) a PACI.
A PACI cannot be aggregated. Aggregation of PACIs is inadvisable from a compression viewpoint, as, in many cases, several to be aggregated NAL units would share identical PACI fields and values which would be carried redundantly for no reason. Most, if not all, of the practical effects of PACI aggregation can be achieved by aggregating NAL units and bundling them with a PACI (see below). Therefore, a PACI must not be aggregated. In other words, an AP must not contain a PACI.

PACI不能分散。 如果PACI可能被分段，并且第一个片段以外的片段丢失，则无法访问PACI中的信息。 因此，PACI不能分散。 换句话说，FU不得携带PACI的（片段）。
PACI无法汇总。 从压缩的角度来看，PACI的聚合是不可取的，因为在许多情况下，几个聚合的NAL单元将共享相同的PACI字段和值，这些字段和值将无缘无故地携带。 大多数（如果不是全部）PACI聚合的实际效果可以通过聚合NAL单元并将它们与PACI捆绑来实现（见下文）。 因此，不得汇总PACI。 换句话说，AP不得包含PACI。

The payload of a PACI can be a fragment. Both middleboxes and sending systems with inflexible (often hardware-based) encoders occasionally find themselves in situations where a PACI and its headers, combined, are larger than the MTU size. In such a scenario, the middlebox or sender can fragment the NAL unit and encapsulate the fragment in a PACI. Doing so preserves the payload header extension information for all fragments, allowing downstream middleboxes and the receiver to take advantage of that information. Therefore, a sender may place a fragment into a PACI, and a receiver must be able to handle such a PACI.
The payload of a PACI can be an aggregation NAL unit. HEVC bitstreams can contain unevenly sized and/or small (when compared to the MTU size) NAL units. In order to efficiently packetize such small NAL units, APs were introduced. The benefits of APs are independent from the need for a payload header extension. Therefore, a sender may place an AP into a PACI, and a receiver must be able to handle such a PACI.

PACI的有效载荷可以是片段。具有不灵活（通常是基于硬件的）编码器的中间盒和发送系统偶尔会发现自己处于PACI及其头部组合的大于MTU大小的情况。在这种情况下，中间盒或发送器可以对NAL单元进行分段并将该片段封装在PACI中。这样做可以保留所有片段的有效负载头扩展信息，允许下游中间盒和接收器利用该信息。因此，发送者可以将片段放入PACI，并且接收者必须能够处理这样的PACI。
PACI的有效载荷可以是聚合NAL单元。 HEVC比特流可以包含不均匀大小和/或小（与MTU大小相比）NAL单元。为了有效地将这种小型NAL单元打包，引入了AP。 AP的好处与有效负载头扩展的需要无关。因此，发送方可以将AP放入PACI，并且接收方必须能够处理这样的PACI。

4.4.4.2. PACI Extensions (Informative)

This section includes recommendations for future specification designers on how to extent the PACI syntax to accommodate future extensions. Obviously, designers are free to specify whatever appears to be appropriate to them at the time of their design. However, a lot of thought has been invested into the extension mechanism described below, and we suggest that deviations from it warrant a good explanation.
This memo defines only a single payload header extension (TSCI, described in Section 4.5); therefore, only the F0 bit carries semantics. F1 and F2 are already named (and not just marked as reserved, as a typical video spec designer would do). They are intended to signal two additional extensions. The Y bit allows one to, recursively, add further F and Y bits to extend the mechanism beyond three possible payload header extensions. It is suggested to define a new packet type (using a different value for Type) when assigning the F1, F2, or Y bits different semantics than what is suggested below.

When a Y bit is set, an 8-bit flag-extension is inserted after the Y bit. A flag-extension consists of 7 flags F[n…n+6], and another Y bit.
The basic PACI header already includes F0, F1, and F2. Therefore, the Fx bits in the first flag-extensions are numbered F3, F4, …, F9; the F bits in the second flag-extension are numbered F10, F11, …, F16, and so forth. As a result, at least three Fx bits are always in the PACI, but the number of Fx bits (and associated types of extensions) can be increased by setting the next Y bit and adding an octet of flag-extensions, carrying seven flags and another Y bit. The size of this list of flags is subject to the limits specified in Section 4.4.4 (32 octets for all flag-extensions and the PHES information combined).
Each of the F bits can indicate either the presence or the absence of certain information in the Payload Header Extension Structure (PHES).
When a spec developer devises a new syntax that takes advantage of the PACI extension mechanism, he/she must follow the constraints listed below; otherwise, the extension mechanism may break.

1. The fields added for a particular Fx bit MUST be fixed in length and not depend on what other Fx bits are set (no parsing dependency).
2. The Fx bits must be assigned in order.
3. An implementation that supports the n-th Fn bit for any value of n must understand the syntax (though not necessarily the semantics) of the fields Fk (with k < n), so as to be able to either use those bits when present, or at least be able to skip over them.

1）为特定Fx位添加的字段必须在长度上固定，而不依赖于设置的其他Fx位（没有解析依赖性）。
2）必须按顺序分配Fx位。
3）支持n的任何值的第n个Fn位的实现必须理解字段Fk(具有k< n)的语法（尽管不一定是语义），以便能够在使用那些位时使用这些位。 现在，或者至少能够跳过它们。

4.5. Temporal Scalability Control Information

This section describes the single payload header extension defined in this specification, known as TSCI. If, in the future, additional payload header extensions become necessary, they could be specified in this section of an updated version of this document, or in their own documents.
When F0 is set to 1 in a PACI, this specifies that the PHES field includes the TSCI fields TL0PICIDX, IrapPicID, S, and E as follows:

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|    PayloadHdr (Type=50)       |A|   cType   | PHSsize |F0..2|Y|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|   TL0PICIDX   |   IrapPicID   |S|E|    RES    |               |
|-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
|                           ....                                |
|               PACI payload: NAL unit                          |
|                                                               |
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 12: The Structure of a PACI with a PHES Containing a TSCI


TL0PICIDX (8 bits) When present, the TL0PICIDX field MUST be set to equal to temporal_sub_layer_zero_idx as specified in Section D.3.22 of [HEVC] for the access unit containing the NAL unit in the PACI.
IrapPicID (8 bits) When present, the IrapPicID field MUST be set to equal to irap_pic_id as specified in Section D.3.22 of [HEVC] for the access unit containing the NAL unit in the PACI.

TL0PICIDX（8位）当存在时，TL0PICIDX字段必须设置为等于[HEVC]的D.3.22节中规定的temporal_sub_layer_zero_idx，用于包含PACI中NAL单元的访问单元。
IrapPicID（8比特）当存在时，IrapPicID字段必须设置为等于[HEVC]的D.3.22节中规定的irap_pic_id，用于包含PACI中NAL单元的访问单元。

S (1 bit) The S bit MUST be set to 1 if any of the following conditions is true and MUST be set to 0 otherwise:
o The NAL unit in the payload of the PACI is the first VCL NAL unit, in decoding order, of a picture.
o The NAL unit in the payload of the PACI is an AP, and the NAL unit in the first contained aggregation unit is the first VCL NAL unit, in decoding order, of a picture.
o The NAL unit in the payload of the PACI is an FU with its S bit equal to 1 and the FU payload containing a fragment of the first VCL NAL unit, in decoding order, of a picture.

S（1位）如果满足以下任一条件，则S位必须设置为1，并且必须设置为0，否则：
o PACI的有效载荷中的NAL单元是图像的解码顺序中的第一个VCL NAL单元。
o PACI的有效载荷中的NAL单元是AP，并且第一包含的聚合单元中的NAL单元是图片的解码顺序中的第一VCL NAL单元。
o PACI的有效载荷中的NAL单元是其S比特等于1的FU，并且FU有效载荷包含图像的解码顺序的第一VCL NAL单元的片段。

E (1 bit) The E bit MUST be set to 1 if any of the following conditions is true and MUST be set to 0 otherwise:
o The NAL unit in the payload of the PACI is the last VCL NAL unit, in decoding order, of a picture.
o The NAL unit in the payload of the PACI is an AP and the NAL unit in the last contained aggregation unit is the last VCL NAL unit, in decoding order, of a picture.
o The NAL unit in the payload of the PACI is an FU with its E bit equal to 1 and the FU payload containing a fragment of the last VCL NAL unit, in decoding order, of a picture.

E（1位）如果满足以下任一条件，则E位必须设置为1，并且必须设置为0，否则：
o PACI的有效载荷中的NAL单元是图像的解码顺序中的最后一个VCL NAL单元。
o PACI的有效载荷中的NAL单元是AP，并且最后包含的聚合单元中的NAL单元是图片的解码顺序中的最后一个VCL NAL单元。
o PACI的有效载荷中的NAL单元是FU，其E比特等于1，FU有效载荷包含图像的解码顺序的最后一个VCL NAL单元的片段。

RES (6 bits) MUST be equal to 0. Reserved for future extensions.
The value of PHSsize MUST be set to 3. Receivers MUST allow other values of the fields F0, F1, F2, Y, and PHSsize, and MUST ignore any additional fields, when present, than specified above in the PHES.

RES（6位）必须等于0.保留用于将来的扩展。
PHSsize的值必须设置为3.接收器必须允许字段F0，F1，F2，Y和PHSsize的其他值，并且必须忽略任何其他字段（如果存在），而不是上面在PHES中指定的字段。

4.6. Decoding Order Number
For each NAL unit, the variable AbsDon is derived, representing the decoding order number that is indicative of the NAL unit decoding order.
Let NAL unit n be the n-th NAL unit in transmission order within an RTP stream.

If sprop-max-don-diff is equal to 0 for all the RTP streams carrying the HEVC bitstream, AbsDon[n], the value of AbsDon for NAL unit n, is derived as equal to n.
Otherwise (sprop-max-don-diff is greater than 0 for any of the RTP streams), AbsDon[n] is derived as follows, where DON[n] is the value of the variable DON for NAL unit n:
o If n is equal to 0 (i.e., NAL unit n is the very first NAL unit in transmission order), AbsDon[0] is set equal to DON[0].
o Otherwise (n is greater than 0), the following applies for derivation of AbsDon[n]:
If DON[n] == DON[n-1], AbsDon[n] = AbsDon[n-1]
If (DON[n] > DON[n-1] and DON[n] - DON[n-1] < 32768), AbsDon[n] = AbsDon[n-1] + DON[n] - DON[n-1]
If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768), AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n]
If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768), AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 DON[n])
If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768), AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n])
For any two NAL units m and n, the following applies:
o AbsDon[n] greater than AbsDon[m] indicates that NAL unit n follows NAL unit m in NAL unit decoding order.
o When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding order of the two NAL units can be in either order.
o AbsDon[n] less than AbsDon[m] indicates that NAL unit n precedes NAL unit m in decoding order.

o如果n等于0（即，NAL单元n是传输顺序中的第一个NAL单元），则将AbsDon [0]设置为等于DON [0]。
o否则（n大于0），以下适用于AbsDon [n]的推导：

o AbsDon [n]大于AbsDon [m]表示NAL单元n在NAL单元解码顺序中遵循NAL单元m。
o当AbsDon [n]等于AbsDon [m]时，两个NAL单元的NAL单元解码顺序可以是任意顺序。
o AbsDon [n]小于AbsDon [m]表示NAL单元n在解码顺序中位于NAL单元m之前。

Informative note: When two consecutive NAL units in the NAL unit decoding order have different values of AbsDon, the absolute difference between the two AbsDon values may be greater than or equal to 1.
Informative note: There are multiple reasons to allow for the absolute difference of the values of AbsDon for two consecutive NAL units in the NAL unit decoding order to be greater than one. An increment by one is not required, as at the time of associating values of AbsDon to NAL units, it may not be known whether all NAL units are to be delivered to the receiver. For example, a gateway may not forward VCL NAL units of higher sublayers or some SEI NAL units when there is congestion in the network. In another example, the first intra-coded picture of a pre-encoded clip is transmitted in advance to ensure that it is readily available in the receiver, and when transmitting the first intra-coded picture, the originator does not exactly know how many NAL units will be encoded before the first intra-coded picture of the pre-encoded clip follows in decoding order. Thus, the values of AbsDon for the NAL units of the first intra-coded picture of the pre-encoded clip have to be estimated when they are transmitted, and gaps in values of AbsDon may occur. Another example is MRST or MRMT with spropmax-don-diff greater than 0, where the AbsDon values must indicate cross-layer decoding order for NAL units conveyed in all the RTP streams.

5 Packetization Rules

The following packetization rules apply:
o If sprop-max-don-diff is greater than 0 for any of the RTP streams, the transmission order of NAL units carried in the RTP stream MAY be different than the NAL unit decoding order and the NAL unit output order. Otherwise (sprop-max-don-diff is equal to 0 for all the RTP streams), the transmission order of NAL units carried in the RTP stream MUST be the same as the NAL unit decoding order and, when tx-mode is equal to “MRST” or “MRMT”, MUST also be the same as the NAL unit output order.
o A NAL unit of a small size SHOULD be encapsulated in an aggregation packet together with one or more other NAL units in order to avoid the unnecessary packetization overhead for small NAL units. For example, non-VCL NAL units such as access unit delimiters, parameter sets, or SEI NAL units are typically small and can often be aggregated with VCL NAL units without violating MTU size constraints.
o Each non-VCL NAL unit SHOULD, when possible from an MTU size match viewpoint, be encapsulated in an aggregation packet together with its associated VCL NAL unit, as typically a non-VCL NAL unit would be meaningless without the associated VCL NAL unit being available.

o如果对于任何RTP流，sprop-max-don-diff大于0，则RTP流中携带的NAL单元的传输顺序可以不同于NAL单元解码顺序和NAL单元输出顺序。否则（对于所有RTP流，sprop-max-don-diff等于0），RTP流中携带的NAL单元的传输顺序必须与NAL单元解码顺序相同，并且当tx-mode等于“MRST”或“MRMT”必须与NAL单位输出顺序相同。
o小尺寸的NAL单元应该与一个或多个其他NAL单元一起封装在聚合分组中，以避免小NAL单元的不必要的分组化开销。例如，诸如访问单元定界符，参数集或SEI NAL单元的非VCL NAL单元通常很小并且通常可以与VCL NAL单元聚合而不违反MTU大小约束。
o每个非VCL NAL单元，如果可能，应该从MTU大小匹配的观点，应该与其关联的VCL NAL单元一起封装在聚合数据包中，因为通常非VCL NAL单元在没有相关的VCL NAL单元的情况下是没有意义的。可用。

6 De-packetization Process

The general concept behind de-packetization is to get the NAL units out of the RTP packets in an RTP stream and all RTP streams the RTP stream depends on, if any, and pass them to the decoder in the NAL unit decoding order.
The de-packetization process is implementation dependent. Therefore, the following description should be seen as an example of a suitable implementation. Other schemes may be used as well, as long as the output for the same input is the same as the process described below. The output is the same when the set of output NAL units and their order are both identical. Optimizations relative to the described algorithms are possible.

All normal RTP mechanisms related to buffer management apply. In particular, duplicated or outdated RTP packets (as indicated by the RTP sequences number and the RTP timestamp) are removed. To determine the exact time for decoding, factors such as a possible intentional delay to allow for proper inter-stream synchronization must be factored in.
NAL units with NAL unit type values in the range of 0 to 47, inclusive, may be passed to the decoder. NAL-unit-like structures with NAL unit type values in the range of 48 to 63, inclusive, MUST NOT be passed to the decoder.
The receiver includes a receiver buffer, which is used to compensate for transmission delay jitter within individual RTP streams and across RTP streams, to reorder NAL units from transmission order to the NAL unit decoding order, and to recover the NAL unit decoding order in MRST or MRMT, when applicable. In this section, the receiver operation is described under the assumption that there is no transmission delay jitter within an RTP stream and across RTP streams. To make a difference from a practical receiver buffer that is also used for compensation of transmission delay jitter, the receiver buffer is hereafter called the de-packetization buffer in this section. Receivers should also prepare for transmission delay jitter; that is, either reserve separate buffers for transmission delay jitter buffering and de-packetization buffering or use a receiver buffer for both transmission delay jitter and de- packetization. Moreover, receivers should take transmission delay jitter into account in the buffering operation, e.g., by additional initial buffering before starting of decoding and playback.

When sprop-max-don-diff is equal to 0 for all the received RTP streams, the de-packetization buffer size is zero bytes, and the process described in the remainder of this paragraph applies. When there is only one RTP stream received, the NAL units carried in the single RTP stream are directly passed to the decoder in their transmission order, which is identical to their decoding order. When there is more than one RTP stream received, the NAL units carried in the multiple RTP streams are passed to the decoder in their NTP timestamp order. When there are several NAL units of different RTP streams with the same NTP timestamp, the order to pass them to the decoder is their dependency order, where NAL units of a dependee RTP stream are passed to the decoder prior to the NAL units of the dependent RTP stream. When there are several NAL units of the same RTP stream with the same NTP timestamp, the order to pass them to the decoder is their transmission order.

Informative note: The mapping between RTP and NTP timestamps is conveyed in RTCP SR packets. In addition, the mechanisms for faster media timestamp synchronization discussed in [RFC6051] may be used to speed up the acquisition of the RTP-to-wall-clock mapping.
When sprop-max-don-diff is greater than 0 for any the received RTP streams, the process described in the remainder of this section applies.
There are two buffering states in the receiver: initial buffering and buffering while playing. Initial buffering starts when the reception is initialized. After initial buffering, decoding and playback are started, and the buffering-while-playing mode is used.

Regardless of the buffering state, the receiver stores incoming NAL units, in reception order, into the de-packetization buffer. NAL units carried in RTP packets are stored in the de-packetization buffer individually, and the value of AbsDon is calculated and stored for each NAL unit. When MRST or MRMT is in use, NAL units of all RTP streams of a bitstream are stored in the same de-packetization buffer. When NAL units carried in any two RTP streams are available to be placed into the de-packetization buffer, those NAL units carried in the RTP stream that is lower in the dependency tree are placed into the buffer first. For example, if RTP stream A depends on RTP stream B, then NAL units carried in RTP stream B are placed into the buffer first.

Initial buffering lasts until condition A (the difference between the greatest and smallest AbsDon values of the NAL units in the de- packetization buffer is greater than or equal to the value of sprop- max-don-diff of the highest RTP stream) or condition B (the number of NAL units in the de-packetization buffer is greater than the value of sprop-depack-buf-nalus) is true.
After initial buffering, whenever condition A or condition B is true, the following operation is repeatedly applied until both condition A and condition B become false:
o The NAL unit in the de-packetization buffer with the smallest value of AbsDon is removed from the de-packetization buffer and passed to the decoder.
When no more NAL units are flowing into the de-packetization buffer, all NAL units remaining in the de-packetization buffer are removed from the buffer and passed to the decoder in the order of increasing AbsDon values.

o具有最小AbsDon值的解包缓冲器中的NAL单元从解包缓冲器中移除并传递给解码器。

This section specifies the parameters that MAY be used to select optional features of the payload format and certain features or properties of the bitstream or the RTP stream. The parameters are specified here as part of the media type registration for the HEVC codec. A mapping of the parameters into the Session Description Protocol (SDP) [RFC4566] is also provided for applications that use SDP. Equivalent parameters could be defined elsewhere for use with control protocols that do not use SDP.

7.1. Media Type Registration

The media subtype for the HEVC codec is allocated from the IETF tree.
The receiver MUST ignore any unrecognized parameter.

Type name: video
Subtype name: H265
Required parameters: none
OPTIONAL parameters:
profile-space, tier-flag, profile-id, profile-compatibility-indicator, interop-constraints, and level-id:

These parameters indicate the profile, tier, default level, and some constraints of the bitstream carried by the RTP stream and all RTP streams the RTP stream depends on, or a specific set of the profile, tier, default level, and some constraints the receiver supports.
The profile and some constraints are indicated collectively by profile-space, profile-id, profile-compatibility-indicator, and interop-constraints. The profile specifies the subset of coding tools that may have been used to generate the bitstream or that the receiver supports.

Informative note: There are 32 values of profile-id, and there are 32 flags in profile-compatibility-indicator, each flag corresponding to one value of profile-id. According to HEVC version 1 in [HEVC], when more than one of the 32 flags is set for a bitstream, the bitstream would comply with all the profiles corresponding to the set flags. However, in a draft of HEVC version 2 in [HEVCv2], Subclause A.3.5, 19 Format Range Extensions profiles have been specified, all using the same value of profile-id (4), differentiated by some of the 48 bits in interop-constraints; this (rather unexpected way of profile signaling) means that one of the 32 flags may correspond to multiple profiles. To be able to support whatever HEVC extension profile that might be specified and indicated using profile-space, profile-id, profile-compatibility-indicator, and interop-constraints in the future, it would be safe to require symmetric use of these parameters in SDP offer/answer unless recv-sub-layer id is included in the SDP answer for choosing one of the sub-layers offered.

The tier is indicated by tier-flag. The default level is indicated by level-id. The tier and the default level specify the limits on values of syntax elements or arithmetic combinations of values of syntax elements that are followed when generating the bitstream or that the receiver supports.
A set of profile-space, tier-flag, profile-id, profilecompatibility-indicator, interop-constraints, and level-id parameters ptlA is said to be consistent with another set of these parameters ptlB if any decoder that conforms to the profile, tier, level, and constraints indicated by ptlB can decode any bitstream that conforms to the profile, tier, level, and constraints indicated by ptlA.

In SDP offer/answer, when the SDP answer does not include the recv-sub-layer-id parameter that is less than the sprop-sublayer-id parameter in the SDP offer, the following applies:
o The profile-space, tier-flag, profile-id, profilecompatibility-indicator, and interop-constraints parameters MUST be used symmetrically, i.e., the value of each of these parameters in the offer MUST be the same as that in the answer, either explicitly signaled or implicitly inferred.
o The level-id parameter is changeable as long as the highest level indicated by the answer is either equal to or lower than that in the offer. Note that the highest level is indicated by level-id and max-recv-level-id together.
In SDP offer/answer, when the SDP answer does include the recvsub-layer-id parameter that is less than the sprop-sub-layer-id parameter in the SDP offer, the set of profile-space, tierflag, profile-id, profile-compatibility-indicator, interopconstraints, and level-id parameters included in the answer MUST be consistent with that for the chosen sub-layer representation as indicated in the SDP offer, with the exception that the level-id parameter in the SDP answer is changeable as long as the highest level indicated by the answer is either lower than or equal to that in the offer.
More specifications of these parameters, including how they relate to the values of the profile, tier, and level syntax elements specified in [HEVC] are provided below.
profile-space, profile-id:
The value of profile-space MUST be in the range of 0 to 3, inclusive. The value of profile-id MUST be in the range of 0 to 31, inclusive.
When profile-space is not present, a value of 0 MUST be inferred. When profile-id is not present, a value of 1 (i.e., the Main profile) MUST be inferred.
When used to indicate properties of a bitstream, profile-space and profile-id are derived from the profile, tier, and level syntax elements in SPS or VPS NAL units as follows, where general_profile_space, general_profile_idc, sub_layer_profile_space[j], and sub_layer_profile_idc[j] are specified in [HEVC]:

If the RTP stream is the highest RTP stream, the following applies:
o profile-space = general_profile_space o profile-id = general_profile_idc
Otherwise (the RTP stream is a dependee RTP stream), the following applies, with j being the value of the sprop-sub layer-id parameter:
o profile-space = sub_layer_profile_space[j] o profile-id = sub_layer_profile_idc[j]
tier-flag, level-id:
The value of tier-flag MUST be in the range of 0 to 1, inclusive. The value of level-id MUST be in the range of 0 to 255, inclusive.
If the tier-flag and level-id parameters are used to indicate properties of a bitstream, they indicate the tier and the highest level the bitstream complies with.
If the tier-flag and level-id parameters are used for capability exchange, the following applies. If max-recv-levelid is not present, the default level defined by level-id indicates the highest level the codec wishes to support. Otherwise, max-recv-level-id indicates the highest level the codec supports for receiving. For either receiving or sending, all levels that are lower than the highest level supported MUST also be supported.
If no tier-flag is present, a value of 0 MUST be inferred; if no level-id is present, a value of 93 (i.e., level 3.1) MUST be inferred.
When used to indicate properties of a bitstream, the tier-flag and level-id parameters are derived from the profile, tier, and level syntax elements in SPS or VPS NAL units as follows, where general_tier_flag, general_level_idc, sub_layer_tier_flag[j], and sub_layer_level_idc[j] are specified in [HEVC]:
If the RTP stream is the highest RTP stream, the following applies:
o tier-flag = general_tier_flag o level-id = general_level_idc
Otherwise (the RTP stream is a dependee RTP stream), the following applies, with j being the value of the sprop-sub layer-id parameter:
o tier-flag = sub_layer_tier_flag[j] o level-id = sub_layer_level_idc[j]
interop-constraints:
A base16 [RFC4648] (hexadecimal) representation of six bytes of data, consisting of progressive_source_flag, interlaced_source_flag, non_packed_constraint_flag, frame_only_constraint_flag, and reserved_zero_44bits.
If the interop-constraints parameter is not present, the following MUST be inferred:
o progressive_source_flag = 1 o interlaced_source_flag = 0 o non_packed_constraint_flag = 1 o frame_only_constraint_flag = 1 o reserved_zero_44bits = 0
When the interop-constraints parameter is used to indicate properties of a bitstream, the following applies, where general_progressive_source_flag, general_interlaced_source_flag, general_non_packed_constraint_flag, general_non_packed_constraint_flag, general_frame_only_constraint_flag, general_reserved_zero_44bits, sub_layer_progressive_source_flag[j], sub_layer_interlaced_source_flag[j], sub_layer_non_packed_constraint_flag[j], sub_layer_frame_only_constraint_flag[j], and sub_layer_reserved_zero_44bits[j] are specified in [HEVC]:
If the RTP stream is the highest RTP stream, the following applies:
o progressive_source_flag = general_progressive_source_flag
o interlaced_source_flag = general_interlaced_source_flag
o non_packed_constraint_flag = general_non_packed_constraint_flag
o frame_only_constraint_flag = general_frame_only_constraint_flag
o reserved_zero_44bits = general_reserved_zero_44bits
Otherwise (the RTP stream is a dependee RTP stream), the following applies, with j being the value of the sprop-sub layer-id parameter:
o progressive_source_flag = sub_layer_progressive_source_flag[j]
o interlaced_source_flag = sub_layer_interlaced_source_flag[j]
o non_packed_constraint_flag = sub_layer_non_packed_constraint_flag[j]
o frame_only_constraint_flag = sub_layer_frame_only_constraint_flag[j]
o reserved_zero_44bits = sub_layer_reserved_zero_44bits[j]
Using interop-constraints for capability exchange results in a requirement on any bitstream to be compliant with the interop-constraints.
profile-compatibility-indicator:
A base16 [RFC4648] representation of four bytes of data.
When profile-compatibility-indicator is used to indicate properties of a bitstream, the following applies, where general_profile_compatibility_flag[j] and sub_layer_profile_compatibility_flag[i][j] are specified in [HEVC]:
The profile-compatibility-indicator in this case indicates additional profiles to the profile defined by profile-space, profile-id, and interop-constraints the bitstream conforms to. A decoder that conforms to any of all the profiles the bitstream conforms to would be capable of decoding the bitstream. These additional profiles are defined by profile-space, each set bit of profile-compatibility indicator, and interop-constraints.
If the RTP stream is the highest RTP stream, the following applies for each value of j in the range of 0 to 31, inclusive:
o bit j of profile-compatibility-indicator = general_profile_compatibility_flag[j]
Otherwise (the RTP stream is a dependee RTP stream), the following applies for i equal to sprop-sub-layer-id and for each value of j in the range of 0 to 31, inclusive:
o bit j of profile-compatibility-indicator = sub_layer_profile_compatibility_flag[i][j]
Using profile-compatibility-indicator for capability exchange results in a requirement on any bitstream to be compliant with the profile-compatibility-indicator. This is intended to handle cases where any future HEVC profile is defined as an intersection of two or more profiles.
If this parameter is not present, this parameter defaults to the following: bit j, with j equal to profile-id, of profilecompatibility-indicator is inferred to be equal to 1, and all other bits are inferred to be equal to 0.
sprop-sub-layer-id:
This parameter MAY be used to indicate the highest allowed value of TID in the bitstream. When not present, the value of sprop-sub-layer-id is inferred to be equal to 6.
The value of sprop-sub-layer-id MUST be in the range of 0 to 6, inclusive.
recv-sub-layer-id:
This parameter MAY be used to signal a receiver’s choice of the offered or declared sub-layer representations in the sprop-vps. The value of recv-sub-layer-id indicates the TID of the highest sub-layer of the bitstream that a receiver supports. When not present, the value of recv-sub-layer-id is inferred to be equal to the value of the sprop-sub-layer-id parameter in the SDP offer.
The value of recv-sub-layer-id MUST be in the range of 0 to 6, inclusive.
max-recv-level-id:
This parameter MAY be used to indicate the highest level a receiver supports. The highest level the receiver supports is equal to the value of max-recv-level-id divided by 30.
The value of max-recv-level-id MUST be in the range of 0 to 255, inclusive.
When max-recv-level-id is not present, the value is inferred to be equal to level-id.
max-recv-level-id MUST NOT be present when the highest level the receiver supports is not higher than the default level.
tx-mode:
This parameter indicates whether the transmission mode is SRST, MRST, or MRMT.
The value of tx-mode MUST be equal to “SRST”, “MRST” or “MRMT”. When not present, the value of tx-mode is inferred to be equal to “SRST”.
If the value is equal to “MRST”, MRST MUST be in use. Otherwise, if the value is equal to “MRMT”, MRMT MUST be in use. Otherwise (the value is equal to “SRST”), SRST MUST be in use.
The value of tx-mode MUST be equal to “MRST” for all RTP streams in an MRST.
The value of tx-mode MUST be equal to “MRMT” for all RTP streams in an MRMT.
sprop-vps:
This parameter MAY be used to convey any video parameter set NAL unit of the bitstream for out-of-band transmission of video parameter sets. The parameter MAY also be used for capability exchange and to indicate sub-stream characteristics (i.e., properties of sub-layer representations as defined in [HEVC]). The value of the parameter is a comma-separated (’,’) list of base64 [RFC4648] representations of the video parameter set NAL units as specified in Section 7.3.2.1 of [HEVC].
The sprop-vps parameter MAY contain one or more than one video parameter set NAL unit. However, all other video parameter sets contained in the sprop-vps parameter MUST be consistent with the first video parameter set in the sprop-vps parameter. A video parameter set vpsB is said to be consistent with another video parameter set vpsA if any decoder that conforms to the profile, tier, level, and constraints indicated by the 12 bytes of data starting from the syntax element general_profile_space to the syntax element general_level_idc, inclusive, in the first profile_tier_level( ) syntax structure in vpsA can decode any bitstream that conforms to the profile, tier, level, and constraints indicated by the 12 bytes of data starting from the syntax element general_profile_space to the syntax element general_level_idc, inclusive, in the first profile_tier_level( ) syntax structure in vpsB.
sprop-sps:
This parameter MAY be used to convey sequence parameter set NAL units of the bitstream for out-of-band transmission of sequence parameter sets. The value of the parameter is a commaseparated (’,’) list of base64 [RFC4648] representations of the sequence parameter set NAL units as specified in Section 7.3.2.2 of [HEVC].
sprop-pps:
This parameter MAY be used to convey picture parameter set NAL units of the bitstream for out-of-band transmission of picture parameter sets. The value of the parameter is a commaseparated (’,’) list of base64 [RFC4648] representations of the picture parameter set NAL units as specified in Section 7.3.2.3 of [HEVC].
sprop-sei:
This parameter MAY be used to convey one or more SEI messages that describe bitstream characteristics. When present, a decoder can rely on the bitstream characteristics that are described in the SEI messages for the entire duration of the session, independently from the persistence scopes of the SEI messages as specified in [HEVC].
The value of the parameter is a comma-separated (’,’) list of base64 [RFC4648] representations of SEI NAL units as specified in Section 7.3.2.4 of [HEVC].
Informative note: Intentionally, no list of applicable or inapplicable SEI messages is specified here. Conveying certain SEI messages in sprop-sei may be sensible in some application scenarios and meaningless in others. However, a few examples are described below:
1) In an environment where the bitstream was created from film-based source material, and no splicing is going to occur during the lifetime of the session, the film grain characteristics SEI message or the tone mapping information SEI message are likely meaningful, and sending them in sprop-sei rather than in the bitstream at each entry point may help with saving bits and allows one to configure the renderer only once, avoiding unwanted artifacts.
2) The structure of pictures information SEI message in sprop-sei can be used to inform a decoder of information on the NAL unit types, picture-order count values, and prediction dependencies of a sequence of pictures. Having such knowledge can be helpful for error recovery.
3) Examples for SEI messages that would be meaningless to be conveyed in sprop-sei include the decoded picture hash SEI message (it is close to impossible that all decoded pictures have the same hashtag), the display orientation SEI message when the device is a handheld device (as the display orientation may change when the handheld device is turned around), or the filler payload SEI message (as there is no point in just having more bits in SDP).
max-lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, max-tc:
These parameters MAY be used to signal the capabilities of a receiver implementation. These parameters MUST NOT be used for any other purpose. The highest level (specified by max-recvlevel-id) MUST be the highest that the receiver is fully capable of supporting. max-lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, and max-tc MAY be used to indicate capabilities of the receiver that extend the required capabilities of the highest level, as specified below.
When more than one parameter from the set (max-lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, max-tc) is present, the receiver MUST support all signaled capabilities simultaneously. For example, if both max-lsr and max-br are present, the highest level with the extension of both the picture rate and bitrate is supported. That is, the receiver is able to decode bitstreams in which the luma sample rate is up to max-lsr (inclusive), the bitrate is up to max-br (inclusive), the coded picture buffer size is derived as specified in the semantics of the max-br parameter below, and the other properties comply with the highest level specified by max-recv-level-id.
Informative note: When the OPTIONAL media type parameters are used to signal the properties of a bitstream, and max lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, and max-tc are not present, the values of profile-space, tier-flag, profile-id, profile-compatibility-indicator, interop constraints, and level-id must always be such that the bitstream complies fully with the specified profile, tier, and level.
max-lsr:
The value of max-lsr is an integer indicating the maximum processing rate in units of luma samples per second. The maxlsr parameter signals that the receiver is capable of decoding video at a higher rate than is required by the highest level.
When max-lsr is signaled, the receiver MUST be able to decode bitstreams that conform to the highest level, with the exception that the MaxLumaSR value in Table A-2 of [HEVC] for the highest level is replaced with the value of max-lsr. Senders MAY use this knowledge to send pictures of a given size at a higher picture rate than is indicated in the highest level.
When not present, the value of max-lsr is inferred to be equal to the value of MaxLumaSR given in Table A-2 of [HEVC] for the highest level.
The value of max-lsr MUST be in the range of MaxLumaSR to 16 * MaxLumaSR, inclusive, where MaxLumaSR is given in Table A-2 of [HEVC] for the highest level.
max-lps:
The value of max-lps is an integer indicating the maximum picture size in units of luma samples. The max-lps parameter signals that the receiver is capable of decoding larger picture sizes than are required by the highest level. When max-lps is signaled, the receiver MUST be able to decode bitstreams that conform to the highest level, with the exception that the MaxLumaPS value in Table A-1 of [HEVC] for the highest level is replaced with the value of max-lps. Senders MAY use this knowledge to send larger pictures at a proportionally lower picture rate than is indicated in the highest level.
When not present, the value of max-lps is inferred to be equal to the value of MaxLumaPS given in Table A-1 of [HEVC] for the highest level.
The value of max-lps MUST be in the range of MaxLumaPS to 16 * MaxLumaPS, inclusive, where MaxLumaPS is given in Table A-1 of [HEVC] for the highest level.
max-cpb:
The value of max-cpb is an integer indicating the maximum coded picture buffer size in units of CpbBrVclFactor bits for the VCL HRD parameters and in units of CpbBrNalFactor bits for the NAL HRD parameters, where CpbBrVclFactor and CpbBrNalFactor are defined in Section A.4 of [HEVC]. The max-cpb parameter signals that the receiver has more memory than the minimum amount of coded picture buffer memory required by the highest level. When max-cpb is signaled, the receiver MUST be able to decode bitstreams that conform to the highest level, with the exception that the MaxCPB value in Table A-1 of [HEVC] for the highest level is replaced with the value of max-cpb. Senders MAY use this knowledge to construct coded bitstreams with greater variation of bitrate than can be achieved with the MaxCPB value in Table A-1 of [HEVC].
When not present, the value of max-cpb is inferred to be equal to the value of MaxCPB given in Table A-1 of [HEVC] for the highest level.
The value of max-cpb MUST be in the range of MaxCPB to 16 * MaxCPB, inclusive, where MaxLumaCPB is given in Table A-1 of [HEVC] for the highest level.
Informative note: The coded picture buffer is used in the hypothetical reference decoder (Annex C of [HEVC]). The use of the hypothetical reference decoder is recommended in HEVC encoders to verify that the produced bitstream conforms to the standard and to control the output bitrate. Thus, the coded picture buffer is conceptually independent of any other potential buffers in the receiver, including de packetization and de-jitter buffers. The coded picture buffer need not be implemented in decoders as specified in Annex C of [HEVC], but rather standard-compliant decoders can have any buffering arrangements provided that they can decode standard-compliant bitstreams. Thus, in practice, the input buffer for a video decoder can be integrated with de-packetization and de-jitter buffers of the receiver.
max-dpb:
The value of max-dpb is an integer indicating the maximum decoded picture buffer size in units decoded pictures at the MaxLumaPS for the highest level, i.e., the number of decoded pictures at the maximum picture size defined by the highest level. The value of max-dpb MUST be in the range of 1 to 16, respectively. The max-dpb parameter signals that the receiver has more memory than the minimum amount of decoded picture buffer memory required by default, which is MaxDpbPicBuf as defined in [HEVC] (equal to 6). When max-dpb is signaled, the receiver MUST be able to decode bitstreams that conform to the highest level, with the exception that the MaxDpbPicBuff value defined in [HEVC] as 6 is replaced with the value of max-dpb. Consequently, a receiver that signals max-dpb MUST be capable of storing the following number of decoded pictures (MaxDpbSize) in its decoded picture buffer:
if( PicSizeInSamplesY <= ( MaxLumaPS >> 2 ) ) MaxDpbSize = Min( 4 * max-dpb, 16 ) else if ( PicSizeInSamplesY <= ( MaxLumaPS >> 1 ) ) MaxDpbSize = Min( 2 * max-dpb, 16 ) else if ( PicSizeInSamplesY <= ( ( 3 * MaxLumaPS ) >> 2 ) ) MaxDpbSize = Min( (4 * max-dpb) / 3, 16 ) else MaxDpbSize = max-dpb
Wherein MaxLumaPS given in Table A-1 of [HEVC] for the highest level and PicSizeInSamplesY is the current size of each decoded picture in units of luma samples as defined in [HEVC].
The value of max-dpb MUST be greater than or equal to the value of MaxDpbPicBuf (i.e., 6) as defined in [HEVC]. Senders MAY use this knowledge to construct coded bitstreams with improved compression.
When not present, the value of max-dpb is inferred to be equal to the value of MaxDpbPicBuf (i.e., 6) as defined in [HEVC].
Informative note: This parameter was added primarily to complement a similar codepoint in the ITU-T Recommendation H.245, so as to facilitate signaling gateway designs. The decoded picture buffer stores reconstructed samples. There is no relationship between the size of the decoded picture buffer and the buffers used in RTP, especially de packetization and de-jitter buffers.
max-br:
The value of max-br is an integer indicating the maximum video bitrate in units of CpbBrVclFactor bits per second for the VCL HRD parameters and in units of CpbBrNalFactor bits per second for the NAL HRD parameters, where CpbBrVclFactor and CpbBrNalFactor are defined in Section A.4 of [HEVC].
The max-br parameter signals that the video decoder of the receiver is capable of decoding video at a higher bitrate than is required by the highest level.
When max-br is signaled, the video codec of the receiver MUST be able to decode bitstreams that conform to the highest level, with the following exceptions in the limits specified by the highest level:
o The value of max-br replaces the MaxBR value in Table A-2 of [HEVC] for the highest level.
o When the max-cpb parameter is not present, the result of the following formula replaces the value of MaxCPB in Table A-1 of [HEVC]:
(MaxCPB of the highest level) * max-br / (MaxBR of the highest level)
For example, if a receiver signals capability for Main profile Level 2 with max-br equal to 2000, this indicates a maximum video bitrate of 2000 kbits/sec for VCL HRD parameters, a maximum video bitrate of 2200 kbits/sec for NAL HRD parameters, and a CPB size of 2000000 bits (2000000 / 1500000 * 1500000).
Senders MAY use this knowledge to send higher bitrate video as allowed in the level definition of Annex A of [HEVC] to achieve improved video quality.
When not present, the value of max-br is inferred to be equal to the value of MaxBR given in Table A-2 of [HEVC] for the highest level.
The value of max-br MUST be in the range of MaxBR to 16 * MaxBR, inclusive, where MaxBR is given in Table A-2 of [HEVC] for the highest level.
Informative note: This parameter was added primarily to complement a similar codepoint in the ITU-T Recommendation H.245, so as to facilitate signaling gateway designs. The assumption that the network is capable of handling such bitrates at any given time cannot be made from the value of this parameter. In particular, no conclusion can be drawn that the signaled bitrate is possible under congestion control constraints.
max-tr:
The value of max-tr is an integer indication the maximum number of tile rows. The max-tr parameter signals that the receiver is capable of decoding video with a larger number of tile rows than the value allowed by the highest level.
When max-tr is signaled, the receiver MUST be able to decode bitstreams that conform to the highest level, with the exception that the MaxTileRows value in Table A-1 of [HEVC] for the highest level is replaced with the value of max-tr.
Senders MAY use this knowledge to send pictures utilizing a larger number of tile rows than the value allowed by the highest level.
When not present, the value of max-tr is inferred to be equal to the value of MaxTileRows given in Table A-1 of [HEVC] for the highest level.
The value of max-tr MUST be in the range of MaxTileRows to 16 * MaxTileRows, inclusive, where MaxTileRows is given in Table A-1 of [HEVC] for the highest level.
max-tc:
The value of max-tc is an integer indication the maximum number of tile columns. The max-tc parameter signals that the receiver is capable of decoding video with a larger number of tile columns than the value allowed by the highest level.
When max-tc is signaled, the receiver MUST be able to decode bitstreams that conform to the highest level, with the exception that the MaxTileCols value in Table A-1 of [HEVC] for the highest level is replaced with the value of max-tc.
Senders MAY use this knowledge to send pictures utilizing a larger number of tile columns than the value allowed by the highest level.
When not present, the value of max-tc is inferred to be equal to the value of MaxTileCols given in Table A-1 of [HEVC] for the highest level.
The value of max-tc MUST be in the range of MaxTileCols to 16 * MaxTileCols, inclusive, where MaxTileCols is given in Table A-1 of [HEVC] for the highest level.
max-fps:
The value of max-fps is an integer indicating the maximum picture rate in units of pictures per 100 seconds that can be effectively processed by the receiver. The max-fps parameter MAY be used to signal that the receiver has a constraint in that it is not capable of processing video effectively at the full picture rate that is implied by the highest level and, when present, one or more of the parameters max-lsr, max-lps, and max-br.
The value of max-fps is not necessarily the picture rate at which the maximum picture size can be sent, it constitutes a constraint on maximum picture rate for all resolutions.
Informative note: The max-fps parameter is semantically different from max-lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, and max-tc in that max-fps is used to signal a constraint, lowering the maximum picture rate from what is implied by other parameters.
The encoder MUST use a picture rate equal to or less than this value. In cases where the max-fps parameter is absent, the encoder is free to choose any picture rate according to the highest level and any signaled optional parameters.
The value of max-fps MUST be smaller than or equal to the full picture rate that is implied by the highest level and, when present, one or more of the parameters max-lsr, max-lps, and max-br.
sprop-max-don-diff:
If tx-mode is equal to “SRST” and there is no NAL unit naluA that is followed in transmission order by any NAL unit preceding naluA in decoding order (i.e., the transmission order of the NAL units is the same as the decoding order), the value of this parameter MUST be equal to 0.
Otherwise, if tx-mode is equal to “MRST” or “MRMT”, the decoding order of the NAL units of all the RTP streams is the same as the NAL unit transmission order and the NAL unit output order, the value of this parameter MUST be equal to either 0 or 1.
Otherwise, if tx-mode is equal to “MRST” or “MRMT” and the decoding order of the NAL units of all the RTP streams is the same as the NAL unit transmission order but not the same as the NAL unit output order, the value of this parameter MUST be equal to 1.
Otherwise, this parameter specifies the maximum absolute difference between the decoding order number (i.e., AbsDon) values of any two NAL units naluA and naluB, where naluA follows naluB in decoding order and precedes naluB in transmission order.
The value of sprop-max-don-diff MUST be an integer in the range of 0 to 32767, inclusive.
When not present, the value of sprop-max-don-diff is inferred to be equal to 0.
sprop-depack-buf-nalus:
This parameter specifies the maximum number of NAL units that precede a NAL unit in transmission order and follow the NAL unit in decoding order.
The value of sprop-depack-buf-nalus MUST be an integer in the range of 0 to 32767, inclusive.
When not present, the value of sprop-depack-buf-nalus is inferred to be equal to 0.
When sprop-max-don-diff is present and greater than 0, this parameter MUST be present and the value MUST be greater than 0.
sprop-depack-buf-bytes:
This parameter signals the required size of the depacketization buffer in units of bytes. The value of the parameter MUST be greater than or equal to the maximum buffer occupancy (in units of bytes) of the de-packetization buffer as specified in Section 6.
The value of sprop-depack-buf-bytes MUST be an integer in the range of 0 to 4294967295, inclusive.
When sprop-max-don-diff is present and greater than 0, this parameter MUST be present and the value MUST be greater than 0. When not present, the value of sprop-depack-buf-bytes is inferred to be equal to 0.
Informative note: The value of sprop-depack-buf-bytes indicates the required size of the de-packetization buffer only. When network jitter can occur, an appropriately sized jitter buffer has to be available as well.
depack-buf-cap:
This parameter signals the capabilities of a receiver implementation and indicates the amount of de-packetization buffer space in units of bytes that the receiver has available for reconstructing the NAL unit decoding order from NAL units carried in one or more RTP streams. A receiver is able to handle any RTP stream, and all RTP streams the RTP stream depends on, when present, for which the value of the spropdepack-buf-bytes parameter is smaller than or equal to this parameter.
When not present, the value of depack-buf-cap is inferred to be equal to 4294967295. The value of depack-buf-cap MUST be an integer in the range of 1 to 4294967295, inclusive.
Informative note: depack-buf-cap indicates the maximum possible size of the de-packetization buffer of the receiver only, without allowing for network jitter.
sprop-segmentation-id:
This parameter MAY be used to signal the segmentation tools present in the bitstream and that can be used for parallelization. The value of sprop-segmentation-id MUST be an integer in the range of 0 to 3, inclusive. When not present, the value of sprop-segmentation-id is inferred to be equal to 0.
When sprop-segmentation-id is equal to 0, no information about the segmentation tools is provided. When sprop-segmentation-id is equal to 1, it indicates that slices are present in the bitstream. When sprop-segmentation-id is equal to 2, it indicates that tiles are present in the bitstream. When spropsegmentation-id is equal to 3, it indicates that WPP is used in the bitstream.
sprop-spatial-segmentation-idc:
A base16 [RFC4648] representation of the syntax element min_spatial_segmentation_idc as specified in [HEVC]. This parameter MAY be used to describe parallelization capabilities of the bitstream.
dec-parallel-cap:
This parameter MAY be used to indicate the decoder’s additional decoding capabilities given the presence of tools enabling parallel decoding, such as slices, tiles, and WPP, in the bitstream. The decoding capability of the decoder may vary with the setting of the parallel decoding tools present in the bitstream, e.g., the size of the tiles that are present in a bitstream. Therefore, multiple capability points may be provided, each indicating the minimum required decoding capability that is associated with a parallelism requirement, which is a requirement on the bitstream that enables parallel decoding.
Each capability point is defined as a combination of 1) a parallelism requirement, 2) a profile (determined by profilespace and profile-id), 3) a highest level, and 4) a maximum processing rate, a maximum picture size, and a maximum video bitrate that may be equal to or greater than that determined by the highest level. The parameter’s syntax in ABNF [RFC5234] is as follows:
dec-parallel-cap = “dec-parallel-cap={” cap-point ("," cap-point) “}”
cap-point = (“w” / “t”) “:” spatial-seg-idc 1
(";" cap-parameter)
spatial-seg-idc = 14DIGIT ; (1-4095)
cap-parameter = tier-flag / level-id / max-lsr / max-lps / max-br
tier-flag = “tier-flag” EQ (“0” / “1”)
level-id = “level-id” EQ 1
3DIGIT ; (0-255)
max-lsr = “max-lsr” EQ 120DIGIT ; (018,446,744,073,709,551,615)
max-lps = “max-lps” EQ 1
10DIGIT ; (0-4,294,967,295)
max-br = “max-br” EQ 1*20DIGIT ; (018,446,744,073,709,551,615)
EQ = “=”
The set of capability points expressed by the dec-parallel-cap parameter is enclosed in a pair of curly braces ("{}"). Each set of two consecutive capability points is separated by a comma (’,’). Within each capability point, each set of two consecutive parameters, and, when present, their values, is separated by a semicolon (’;’).
The profile of all capability points is determined by profilespace and profile-id, which are outside the dec-parallel-cap parameter.
Each capability point starts with an indication of the parallelism requirement, which consists of a parallel tool type, which may be equal to ‘w’ or ‘t’, and a decimal value of the spatial-seg-idc parameter. When the type is ‘w’, the capability point is valid only for H.265 bitstreams with WPP in use, i.e., entropy_coding_sync_enabled_flag equal to 1. When the type is ‘t’, the capability point is valid only for H.265 bitstreams with WPP not in use (i.e., entropy_coding_sync_enabled_flag equal to 0). The capabilitypoint is valid only for H.265 bitstreams with min_spatial_segmentation_idc equal to or greater than spatialseg-idc.
After the parallelism requirement indication, each capability point continues with one or more pairs of parameter and value in any order for any of the following parameters:
o tier-flag o level-id o max-lsr o max-lps o max-br
At most, one occurrence of each of the above five parameters is allowed within each capability point.
The values of dec-parallel-cap.tier-flag and dec-parallelcap.level-id for a capability point indicate the highest level of the capability point. The values of dec-parallel-cap.maxlsr, dec-parallel-cap.max-lps, and dec-parallel-cap.max-br for a capability point indicate the maximum processing rate in units of luma samples per second, the maximum picture size in units of luma samples, and the maximum video bitrate (in units of CpbBrVclFactor bits per second for the VCL HRD parameters and in units of CpbBrNalFactor bits per second for the NAL HRD parameters where CpbBrVclFactor and CpbBrNalFactor are defined in Section A.4 of [HEVC]).
When not present, the value of dec-parallel-cap.tier-flag is inferred to be equal to the value of tier-flag outside the decparallel-cap parameter. When not present, the value of decparallel-cap.level-id is inferred to be equal to the value of max-recv-level-id outside the dec-parallel-cap parameter. When not present, the value of dec-parallel-cap.max-lsr, decparallel-cap.max-lps, or dec-parallel-cap.max-br is inferred to be equal to the value of max-lsr, max-lps, or max-br, respectively, outside the dec-parallel-cap parameter.
The general decoding capability, expressed by the set of parameters outside of dec-parallel-cap, is defined as the capability point that is determined by the following combination of parameters: 1) the parallelism requirement corresponding to the value of sprop-segmentation-id equal to 0 for a bitstream, 2) the profile determined by profile-space, profile-id, profile-compatibility-indicator, and interopconstraints, 3) the tier and the highest level determined by tier-flag and max-recv-level-id, and 4) the maximum processing rate, the maximum picture size, and the maximum video bitrate determined by the highest level. The general decoding capability MUST NOT be included as one of the set of capability points in the dec-parallel-cap parameter.
For example, the following parameters express the general decoding capability of 720p30 (Level 3.1) plus an additional decoding capability of 1080p30 (Level 4) given that the spatially largest tile or slice used in the bitstream is equal to or less than 1/3 of the picture size:
a=fmtp:98 level-id=93;dec-parallel-cap={t:8;level- id=120}
For another example, the following parameters express an additional decoding capability of 1080p30, using dec-parallelcap.max-lsr and dec-parallel-cap.max-lps, given that WPP is used in the bitstream:
a=fmtp:98 level-id=93;dec-parallel-cap={w:8; max-lsr=62668800;max-lps=2088960}
Informative note: When min_spatial_segmentation_idc is present in a bitstream and WPP is not used, [HEVC] specifies that there is no slice or no tile in the bitstream containing more than 4 * PicSizeInSamplesY / ( min_spatial_segmentation_idc + 4 ) luma samples.
include-dph:
This parameter is used to indicate the capability and preference to utilize or include Decoded Picture Hash (DPH) SEI messages (see Section D.3.19 of [HEVC]) in the bitstream. DPH SEI messages can be used to detect picture corruption so the receiver can request picture repair, see Section 8. The value is a comma-separated list of hash types that is supported or requested to be used, each hash type provided as an unsigned integer value (0-255), with the hash types listed from most preferred to the least preferred. Example: “include-dph=0,2”, which indicates the capability for MD5 (most preferred) and Checksum (less preferred). If the parameter is not included or the value contains no hash types, then no capability to utilize DPH SEI messages is assumed. Note that DPH SEI messages MAY still be included in the bitstream even when there is no declaration of capability to use them, as in general SEI messages do not affect the normative decoding process and decoders are allowed to ignore SEI messages.
Encoding considerations:
This type is only defined for transfer via RTP (RFC 3550).
Security considerations:
See Section 9 of RFC 7798.
Published specification:
Please refer to RFC 7798 and its Section 12.
File extensions: none
Macintosh file type code: none
Object identifier or OID: none
Person & email address to contact for further information:
Ye-Kui Wang (yekui.wang@gmail.com)
Intended usage: COMMON
Author: See Authors’ Addresses section of RFC 7798.
Change controller:
IETF Audio/Video Transport Payloads working group delegated from the IESG.

7.2. SDP Parameters
The receiver MUST ignore any parameter unspecified in this memo.

7.2.1. Mapping of Payload Type Parameters to SDP

The media type video/H265 string is mapped to fields in the Session Description Protocol (SDP) [RFC4566] as follows:
o The media name in the “m=” line of SDP MUST be video.
o The encoding name in the “a=rtpmap” line of SDP MUST be H265 (the media subtype).
o The clock rate in the “a=rtpmap” line MUST be 90000.
o The OPTIONAL parameters profile-space, profile-id, tier-flag, level-id, interop-constraints, profile-compatibility-indicator, sprop-sub-layer-id, recv-sub-layer-id, max-recv-level-id, tx-mode,max-lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, max-tc, max- fps, sprop-max-don-diff, sprop-depack-buf-nalus, sprop-depack-buf- bytes, depack-buf-cap, sprop-segmentation-id, sprop-spatial- segmentation-idc, dec-parallel-cap, and include-dph, when present, MUST be included in the “a=fmtp” line of SDP. This parameter is expressed as a media type string, in the form of a semicolon- separated list of parameter=value pairs.
o The OPTIONAL parameters sprop-vps, sprop-sps, and sprop-pps, when present, MUST be included in the “a=fmtp” line of SDP or conveyed using the “fmtp” source attribute as specified in Section 6.3 of [RFC5576]. For a particular media format (i.e., RTP payload type), sprop-vps sprop-sps, or sprop-pps MUST NOT be both included in the “a=fmtp” line of SDP and conveyed using the “fmtp” source attribute. When included in the “a=fmtp” line of SDP, these parameters are expressed as a media type string, in the form of a semicolon-separated list of parameter=value pairs. When conveyed in the “a=fmtp” line of SDP for a particular payload type, the parameters sprop-vps, sprop-sps, and sprop-pps MUST be applied to each SSRC with the payload type. When conveyed using the “fmtp” source attribute, these parameters are only associated with the given source and payload type as parts of the “fmtp” source attribute.
Informative note: Conveyance of sprop-vps, sprop-sps, and sprop-pps using the “fmtp” source attribute allows for out-ofband transport of parameter sets in topologies like Topo-Videoswitch-MCU as specified in [RFC7667].
An example of media representation in SDP is as follows:
m=video 49170 RTP/AVP 98 a=rtpmap:98 H265/90000 a=fmtp:98 profile-id=1; sprop-vps=

o SDP“m =”行中的媒体名称必须是视频。
o SDP的“a = rtpmap”行中的编码名称必须是H265（媒体子类型）。
o “a=rtpmap”行中的时钟速率必须为90000。
o OPTIONAL参数profile-space，profile-id，tier-flag，level-id，interop-constraints，profile-compatibility-indicator，sprop-sub-layer-id，recv-sub-layer-id，max-recv- level-id，tx-mode，max-lsr，max-lps，max-cpb，max-dpb，max-br，max-tr，max-tc，max-fps，sprop-max-don-diff，sprop- depack-buf-nalus，sprop-depack-buf-bytes，depack-buf-cap，sprop-segmentation-id，sprop-spatial-segmentation-idc，dec-parallel-cap和include-dph，如果存在，必须是包含在SDP的“a = fmtp”行中。此参数表示为媒体类型字符串，采用以分号分隔的参数=值对列表的形式。
o 可选参数sprop-vps，sprop-sps和sprop-pps（如果存在）必须包含在SDP的“a=fmtp”行中，或者使用[RFC5576]第6.3节中指定的“fmtp”源属性传送]。对于特定的媒体格式（即RTP有效载荷类型），sprop-vps sprop-sps或sprop-pps不能同时包含在SDP的“a=fmtp”行中，并使用“fmtp”源属性传送。当包含在SDP的“a = fmtp”行中时，这些参数表示为媒体类型字符串，以分号分隔的参数=值对列表的形式表示。当在特定有效载荷类型的SDP的“a=fmtp”行中传送时，参数sprop-vps，sprop-sps和sprop-pps必须应用于具有有效载荷类型的每个SSRC。使用“fmtp”源属性传送时，这些参数仅与给定的源和有效内容类型相关联，作为“fmtp”源属性的一部分。

SDP中的媒体表示示例如下:
m=video 49170 RTP/AVP 98 a=rtpmap:98 H265/90000 a=fmtp:98 profile-id=1; sprop-vps=

7.2.2. Usage with SDP Offer/Answer Model

When HEVC is offered over RTP using SDP in an offer/answer model [RFC3264] for negotiation for unicast usage, the following limitations and rules apply:
o The parameters identifying a media format configuration for HEVC are profile-space, profile-id, tier-flag, level-id, interop- constraints, profile-compatibility-indicator, and tx-mode. These media configuration parameters, except level-id, MUST be used symmetrically when the answerer does not include recv-sub-layer-id in the answer for the media format (payload type) or the included recv-sub-layer-id is equal to sprop-sub-layer-id in the offer. The answerer MUST:

o标识HEVC的媒体格式配置的参数是profile-space，profile-id，tier-flag，level-id，interop-constraints，profile-compatibility-indicator和tx-mode。 当应答者在媒体格式（有效载荷类型）的答案中不包括recv-sub-layer-id或者包含的recv-sub-layer-id相等时，这些媒体配置参数（level-id除外）必须对称使用。 要约中的sprop-sub-layer-id。 回答者必须：

1. maintain all configuration parameters with the values remaining the same as in the offer for the media format (payload type), with the exception that the value of level-id is changeable as long as the highest level indicated by the answer is not higher than that indicated by the offer;

1）维护所有配置参数，其值保持与媒体格式（有效载荷类型）的提议相同，只要level-id的值可以更改，只要答案指示的最高级别不高 比报价所示;

1. include in the answer the recv-sub-layer-id parameter, with a value less than the sprop-sub-layer-id parameter in the offer, for the media format (payload type), and maintain all configuration parameters with the values being the same as signaled in the sprop-vps for the chosen sub-layer representation, with the exception that the value of level-id is changeable as long as the highest level indicated by the answer is not higher than the level indicated by the sprop-vps in offer for the chosen sub-layer representation; or

2）在答案中包含recv-sub-layer-id参数，其值小于商品中的sprop-sub-layer-id参数，用于媒体格式（有效负载类型），并维护所有配置参数。 值与在所选子层表示的sprop-vps中发出信号的值相同，只要level-id的值可以更改，只要答案指示的最高级别不高于由此表示的级别。 sprop-vps提供所选的子层表示; 要么

1. remove the media format (payload type) completely (when one or more of the parameter values are not supported).
Informative note: The above requirement for symmetric use does not apply for level-id, and does not apply for the other bitstream or RTP stream properties and capability parameters.

3）完全删除媒体格式（有效载荷类型）（当不支持一个或多个参数值时）。

o The profile-compatibility-indicator, when offered as sendonly, describes bitstream properties. The answerer MAY accept an RTP payload type even if the decoder is not capable of handling the profile indicated by the profile-space, profile-id, and interop- constraints parameters, but capable of any of the profiles indicated by the profile-space, profile-compatibility-indicator, and interop-constraints. However, when the profile-compatibility- indicator is used in a recvonly or sendrecv media description, the bitstream using this RTP payload type is required to conform to all profiles indicated by profile-space, profile-compatibility- indicator, and interop-constraints.
o To simplify handling and matching of these configurations, the same RTP payload type number used in the offer SHOULD also be used in the answer, as specified in [RFC3264].

o配置文件兼容性指示符，当以sendonly形式提供时，描述比特流属性。 应答者可以接受RTP有效载荷类型，即使解码器不能处理由profile-space，profile-id和interop-constraints参数指示的配置文件，但能够处理由配置文件空间指示的任何配置文件， profile-compatibility-indicator和interop-constraints。 但是，当在recvonly或sendrecv媒体描述中使用配置文件兼容性指示符时，需要使用此RTP有效载荷类型的比特流符合由配置文件空间，配置文件兼容性指示符和互操作约束指示的所有配置文件。
o 为了简化这些配置的处理和匹配，在[RFC3264]中规定的答案中也应该使用商品中使用的相同RTP有效负载类型号。

o当答案包含recv-sub-layer-id时，必须在答案中使用媒体子类型H265的报价中使用的相同RTP有效载荷类型号。当答案不包括recv-sub-layer-id时，答案绝不能包含媒体子类型H265的报价中使用的有效载荷类型编号，除非配置与报价中的配置或答案中的配置完全相同与具有不同level-id值的商品中的商品不同。如果HEVC比特流包含多个操作点（使用时间可伸缩性和子层），则回答可以包含recv-sub-layer-id参数，并且在第一个中存在子层信息的商品中包含sprop-vps。 sprop-vps中包含的视频参数集。如果在商品中提供了sprop-vps，则应答者可以选择sprop-vps中包含的第一个视频参数集中指示的特定操作点。当答案包含一个小于商品中sprop-sub-layer-id的recv-sub-layer-id时，SDP应答中sprop-vps参数中包含的所有视频参数集以及所有发送的视频参数集 - 提供者 - 应答者方向或应答者 - 提交者方向的带必须与要约的sprop-vps参数中的第一个视频参数集一致（参见本节第7.1节中的sprop-vps的语义）一个视频参数集上的文档与另一个视频参数集一致），并且在任一方向上发送的比特流必须符合所选子层表示的配置文件，层，级别和约束，如第一个profile_tier_level（）语法所示在商品的sprop-vps参数中设置的第一个视频参数中的结构。

Informative note: When an offerer receives an answer that does not include recv-sub-layer-id, it has to compare payload types not declared in the offer based on the media type (i.e., video/H265) and the above media configuration parameters with any payload types it has already declared. This will enable it to determine whether the configuration in question is new or if it is equivalent to configuration already offered, since a different payload type number may be used in the answer. The ability to perform operation point selection enables a receiver to utilize the temporal scalable nature of an HEVC bitstream.

o The parameters sprop-max-don-diff, sprop-depack-buf-nalus, and sprop-depack-buf-bytes describe the properties of an RTP stream, and all RTP streams the RTP stream depends on, when present, that the offerer or the answerer is sending for the media format configuration. This differs from the normal usage of the offer/answer parameters: normally such parameters declare the properties of the bitstream or RTP stream that the offerer or the answerer is able to receive. When dealing with HEVC, the offerer assumes that the answerer will be able to receive media encoded using the configuration being offered.

o参数sprop-max-don-diff，sprop-depack-buf-nalus和sprop-depack-buf-bytes描述了RTP流的属性，RTP流所依赖的所有RTP流（如果存在）依赖于提交者或回答者正在发送媒体格式配置。这与提供/应答参数的正常使用不同：通常这样的参数声明提议者或应答者能够接收的比特流或RTP流的属性。在处理HEVC时，提议者假定应答者将能够接收使用所提供的配置编码的媒体。

Informative note: The above parameters apply for any RTP stream and all RTP streams the RTP stream depends on, when present, sent by a declaring entity with the same configuration. In other words, the applicability of the above parameters to RTP streams depends on the source endpoint. Rather than being bound to the payload type, the values may have to be applied to another payload type when being sent, as they apply for the configuration.
o The capability parameters max-lsr, max-lps, max-cpb, max-dpb, max- br, max-tr, and max-tc MAY be used to declare further capabilities of the offerer or answerer for receiving. These parameters MUST NOT be present when the direction attribute is sendonly.
o The capability parameter max-fps MAY be used to declare lower capabilities of the offerer or answerer for receiving. The parameters MUST NOT be present when the direction attribute is sendonly.
o The capability parameter dec-parallel-cap MAY be used to declare additional decoding capabilities of the offerer or answerer for receiving. Upon receiving such a declaration of a receiver, a sender MAY send a bitstream to the receiver utilizing those capabilities under the assumption that the bitstream fulfills the parallelism requirement. A bitstream that is sent based on choosing a capability point with parallel tool type ‘w’ from dec- parallel-cap MUST have entropy_coding_sync_enabled_flag equal to 1 and min_spatial_segmentation_idc equal to or larger than dec- parallel-cap.spatial-seg-idc of the capability point. A bitstream that is sent based on choosing a capability point with parallel tool type ‘t’ from dec-parallel-cap MUST have entropy_coding_sync_enabled_flag equal to 0 and min_spatial_segmentation_idc equal to or larger than dec-parallel- cap.spatial-seg-idc of the capability point.
o An offerer has to include the size of the de-packetization buffer, sprop-depack-buf-bytes, as well as sprop-max-don-diff and sprop- depack-buf-nalus, in the offer for an interleaved HEVC bitstream or for the MRST or MRMT transmission mode when sprop-max-don-diff is greater than 0 for at least one of the RTP streams. To enable the offerer and answerer to inform each other about their capabilities for de-packetization buffering in receiving RTP streams, both parties are RECOMMENDED to include depack-buf-cap. For interleaved RTP streams or in MRST or MRMT, it is also RECOMMENDED to consider offering multiple payload types with different buffering requirements when the capabilities of the receiver are unknown.

o能力参数max-lsr，max-lps，max-cpb，max-dpb，max-br，max-tr和max-tc可用于声明提议者或应答者的其他能力以进行接收。当direction属性是sendonly时，这些参数不能出现。
o能力参数max-fps可用于声明提议者或应答者的较低能力以进行接收。当direction属性为sendonly时，参数不得出现。
o能力参数dec-parallel-cap可用于声明提议者或应答者的其他解码能力以进行接收。在接收到这样的接收器声明时，发送器可以在假设比特流满足并行性要求的情况下利用这些能力向接收器发送比特流。基于从dec-parallel-cap选择具有并行工具类型’w’的能力点而发送的比特流必须使entropy_coding_sync_enabled_flag等于1并且min_spatial_segmentation_idc等于或大于dec-parallel-cap.spatial-seg-idc。能力点。基于从dec-parallel-cap选择具有并行工具类型’t’的能力点而发送的比特流必须使entropy_coding_sync_enabled_flag等于0并且min_spatial_segmentation_idc等于或大于dec-parallel-cap.spatial-seg-idc。能力点。
o提供者必须包括解包缓冲区的大小，sprop-depack-buf-bytes，以及sprop-max-don-diff和sprop-depack-buf-nalus，以提供交错的HEVC比特流或者对于MRST或MRMT传输模式，当sprop-max-don-diff对于至少一个RTP流大于0时。为了使提供者和应答者能够相互通知他们在接收RTP流时解包缓冲的能力，建议双方都包括depack-buf-cap。对于交错的RTP流或在MRST或MRMT中，还建议考虑在接收器的能力未知时提供具有不同缓冲要求的多个有效载荷类型。

o The capability parameter include-dph MAY be used to declare the capability to utilize decoded picture hash SEI messages and which types of hashes in any HEVC RTP streams received by the offerer or answerer.
o The sprop-vps, sprop-sps, or sprop-pps, when present (included in the “a=fmtp” line of SDP or conveyed using the “fmtp” source attribute as specified in Section 6.3 of [RFC5576]), are used for out-of-band transport of the parameter sets (VPS, SPS, or PPS, respectively).
o The answerer MAY use either out-of-band or in-band transport of parameter sets for the bitstream it is sending, regardless of whether out-of-band parameter sets transport has been used in the offerer-to-answerer direction. Parameter sets included in an answer are independent of those parameter sets included in the offer, as they are used for decoding two different bitstreams, one from the answerer to the offerer and the other in the opposite direction. In case some RTP streams are sent before the SDP offer/answer settles down, in-band parameter sets MUST be used for those RTP stream parts sent before the SDP offer/answer.

o能力参数include-dph可用于声明利用解码图像散列SEI消息的能力以及由提议者或应答者接收的任何HEVC RTP流中的哪些类型的散列。
o当存在sprop-vps，sprop-sps或sprop-pps时（包括在SDP的“a = fmtp”行中或使用[RFC5576]第6.3节中指定的“fmtp”源属性传送），用于参数集的带外传输（分别为VPS，SPS或PPS）。
o应答者可以对其发送的比特流使用参数集的带外或带内传输，无论带外参数集传输是否已在传播者到应答者的方向上使用。包含在答案中的参数集独立于提议中包括的那些参数集，因为它们用于解码两个不同的比特流，一个从应答者到提供者，另一个在相反方向。如果在SDP提供/应答结束之前发送了一些RTP流，则带内参数集必须用于在SDP提供/应答之前发送的那些RTP流部分。

o The following rules apply to transport of parameter set in the offerer-to-answerer direction.

• An offer MAY include sprop-vps, sprop-sps, and/or sprop-pps. If none of these parameters is present in the offer, then only in-band transport of parameter sets is used.
• If the level to use in the offerer-to-answerer direction is equal to the default level in the offer, the answerer MUST be prepared to use the parameter sets included in sprop-vps, sprop-sps, and sprop-pps (either included in the “a=fmtp” line of SDP or conveyed using the “fmtp” source attribute) for decoding the incoming bitstream, e.g., by passing these parameter set NAL units to the video decoder before passing any NAL units carried in the RTP streams. Otherwise, the answerer MUST ignore sprop-vps, sprop-sps, and sprop-pps (either included in the “a=fmtp” line of SDP or conveyed using the “fmtp” source attribute) and the offerer MUST transmit parameter sets in-band.
• In MRST or MRMT, the answerer MUST be prepared to use the parameter sets out-of-band transmitted for the RTP stream and all RTP streams the RTP stream depends on, when present, for decoding the incoming bitstream, e.g., by passing these parameter set NAL units to the video decoder before passing any NAL units carried in the RTP streams.

+报价可能包括sprop-vps，sprop-sps和/或sprop-pps。如果要约中不存在这些参数，则仅使用参数集的带内传输。
+如果在forferer-to-answerwerer方向中使用的级别等于offer中的默认级别，则应答者必须准备好使用sprop-vps，sprop-sps和sprop-pps中包含的参数集（或者包括在SDP的“a = fmtp”行中或者使用“fmtp”源属性传送）用于解码输入比特流，例如，通过在传递RTP流中携带的任何NAL单元之前将这些参数集NAL单元传递到视频解码器。 。否则，应答者必须忽略sprop-vps，sprop-sps和sprop-pps（包含在SDP的“a = fmtp”行中或使用“fmtp”源属性传送），并且提议者必须传输参数集 - 带。
+在MRST或MRMT中，应答者必须准备好使用为RTP流发送的带外参数集和RTP流所依赖的所有RTP流（如果存在）用于解码输入比特流，例如通过传递这些参数在传递RTP流中携带的任何NAL单元之前将NAL单元设置到视频解码器。

o The following rules apply to transport of parameter set in the answerer-to-offerer direction.

• An answer MAY include sprop-vps, sprop-sps, and/or sprop-pps. If none of these parameters is present in the answer, then only in-band transport of parameter sets is used.
• The offerer MUST be prepared to use the parameter sets included in sprop-vps, sprop-sps, and sprop-pps (either included in the “a=fmtp” line of SDP or conveyed using the “fmtp” source attribute) for decoding the incoming bitstream, e.g., by passing these parameter set NAL units to the video decoder before passing any NAL units carried in the RTP streams.
• In MRST or MRMT, the offerer MUST be prepared to use the parameter sets out-of-band transmitted for the RTP stream and all RTP streams the RTP stream depends on, when present, for decoding the incoming bitstream, e.g., by passing these parameter set NAL units to the video decoder before passing any NAL units carried in the RTP streams.

o以下规则适用于在回答者 - 提交者方向上传输参数集。
+答案可能包括sprop-vps，sprop-sps和/或sprop-pps。 如果答案中不存在这些参数，则仅使用参数集的带内传输。
+提供者必须准备好使用sprop-vps，sprop-sps和sprop-pps中包含的参数集（包括在SDP的“a = fmtp”行中或使用“fmtp”源属性传送）进行解码 输入比特流，例如，通过在传递RTP流中携带的任何NAL单元之前将这些参数集NAL单元传递到视频解码器。
+在MRST或MRMT中，提议者必须准备好使用为RTP流发送的带外参数集和RTP流所依赖的所有RTP流（当存在时）用于解码输入比特流，例如通过传递这些 参数在传递RTP流中携带的任何NAL单元之前将NAL单元设置到视频解码器。

o When sprop-vps, sprop-sps, and/or sprop-pps are conveyed using the “fmtp” source attribute as specified in Section 6.3 of [RFC5576], the receiver of the parameters MUST store the parameter sets included in sprop-vps, sprop-sps, and/or sprop-pps and associate them with the source given as part of the “fmtp” source attribute. Parameter sets associated with one source (given as part of the “fmtp” source attribute) MUST only be used to decode NAL units conveyed in RTP packets from the same source (given as part of the “fmtp” source attribute). When this mechanism is in use, SSRC collision detection and resolution MUST be performed as specified in [RFC5576].
For bitstreams being delivered over multicast, the following rules apply:
o The media format configuration is identified by profile-space, profile-id, tier-flag, level-id, interop-constraints, profilecompatibility-indicator, and tx-mode. These media format configuration parameters, including level-id, MUST be used symmetrically; that is, the answerer MUST either maintain all configuration parameters or remove the media format (payload type) completely. Note that this implies that the level-id for offer/answer in multicast is not changeable.

o当使用[RFC5576]第6.3节中规定的“fmtp”源属性传送sprop-vps，sprop-sps和/或sprop-pps时，参数的接收者必须存储sprop-vps中包含的参数集，sprop-sps和/或sprop-pps，并将它们与作为“fmtp”源属性的一部分给出的源相关联。与一个源相关联的参数集（作为“fmtp”源属性的一部分给出）必须仅用于解码来自相同源的RTP分组中传送的NAL单元（作为“fmtp”源属性的一部分给出）。使用此机制时，必须按照[RFC5576]中的规定执行SSRC冲突检测和解决。

o媒体格式配置由profile-space，profile-id，tier-flag，level-id，interop-constraints，profilecompatibility-indicator和tx-mode标识。这些媒体格式配置参数，包括level-id，必须对称使用;也就是说，回答者必须保持所有配置参数或完全删除媒体格式（有效载荷类型）。请注意，这意味着多播中提供/应答的级别ID不可更改。

o To simplify the handling and matching of these configurations, the same RTP payload type number used in the offer SHOULD also be used in the answer, as specified in [RFC3264]. An answer MUST NOT contain a payload type number used in the offer unless the configuration is the same as in the offer.
o Parameter sets received MUST be associated with the originating source and MUST only be used in decoding the incoming bitstream from the same source.
o The rules for other parameters are the same as above for unicast as long as the three above rules are obeyed.
Table 1 lists the interpretation of all the parameters that MUST be used for the various combinations of offer, answer, and direction attributes. Note that the two columns wherein the recv-sub-layer-id parameter is used only apply to answers, whereas the other columns apply to both offers and answers.
Table 1. Interpretation of parameters for various combinations of offers, answers, direction attributes, with and without recv-sub- layer-id. Columns that do not indicate offer or answer apply to both.

o为了简化这些配置的处理和匹配，在[RFC3264]中规定的答案中也应该使用商品中使用的相同RTP有效负载类型号。答案绝不能包含要约中使用的有效负载类型编号，除非配置与要约中的配置相同。
o接收的参数集必须与始发源相关联，并且必须仅用于解码来自同一源的输入比特流。
o只要符合上述三条规则，其他参数的规则与上述相同。

                                 sendonly --+
recvonly w/o recv-sub-layer-id --+  |  |
answer: sendrecv, recv-sub-layer-id --+  |  |  |
sendrecv w/o recv-sub-layer-id --+  |  |  |  |
|  |  |  |  |
profile-space                      C  D  C  D  P
profile-id                         C  D  C  D  P
tier-flag                          C  D  C  D  P
level-id                           D  D  D  D  P
interop-constraints                C  D  C  D  P
profile-compatibility-indicator    C  D  C  D  P
tx-mode                            C  C  C  C  P
max-recv-level-id                  R  R  R  R  -
sprop-max-don-diff                 P  P  -  -  P
sprop-depack-buf-nalus             P  P  -  -  P
sprop-depack-buf-bytes             P  P  -  -  P
depack-buf-cap                     R  R  R  R  -
sprop-segmentation-id              P  P  P  P  P
sprop-spatial-segmentation-idc     P  P  P  P  P
max-br                             R  R  R  R  -
max-cpb                            R  R  R  R  -
max-dpb                            R  R  R  R  -
max-lsr                            R  R  R  R  -
max-lps                            R  R  R  R  -
max-tr                             R  R  R  R  -
max-tc                             R  R  R  R  -
max-fps                            R  R  R  R  -
sprop-vps                          P  P  -  -  P
sprop-sps                          P  P  -  -  P
sprop-pps                          P  P  -  -  P
sprop-sub-layer-id                 P  P  -  -  P
recv-sub-layer-id                  X  O  X  O  -
dec-parallel-cap                   R  R  R  R  -
include-dph                        R  R  R  R  -


Legend:
C: configuration for sending and receiving bitstreams D: changeable configuration, same as C except possible to answer with a different but consistent value (see the semantics of the six parameters related to profile, tier, and level on these parameters being consistent) P: properties of the bitstream to be sent R: receiver capabilities O: operation point selection X: MUST NOT be present -: not usable, when present MUST be ignored Parameters used for declaring receiver capabilities are, in general, downgradable; i.e., they express the upper limit for a sender’s possible behavior. Thus, a sender MAY select to set its encoder using only lower/lesser or equal values of these parameters.
When the answer does not include a recv-sub-layer-id that is less than the sprop-sub-layer-id in the offer, parameters declaring a configuration point are not changeable, with the exception of the level-id parameter for unicast usage, and these parameters express values a receiver expects to be used and MUST be used verbatim in the answer as in the offer.
When a sender’s capabilities are declared with the configuration parameters, these parameters express a configuration that is acceptable for the sender to receive bitstreams. In order to achieve high interoperability levels, it is often advisable to offer multiple alternative configurations. It is impossible to offer multiple configurations in a single payload type. Thus, when multiple configuration offers are made, each offer requires its own RTP payload type associated with the offer. However, it is possible to offer multiple operation points using one configuration in a single payload type by including sprop-vps in the offer and recv-sub-layer- id in the answer.
A receiver SHOULD understand all media type parameters, even if it only supports a subset of the payload format’s functionality. This ensures that a receiver is capable of understanding when an offer to receive media can be downgraded to what is supported by the receiver of the offer.
An answerer MAY extend the offer with additional media format configurations. However, to enable their usage, in most cases a second offer is required from the offerer to provide the bitstream property parameters that the media sender will use. This also has the effect that the offerer has to be able to receive this media format configuration, not only to send it.

C：用于发送和接收比特流的配置D：可变配置，与C相同，除了可以用不同但一致的值回答（参见与这些参数上的配置文件，层和级别相关的六个参数的语义是否一致）P：要发送的比特流的属性R：接收机能力O：操作点选择X：必须不存在 - ：不可用，当存在时必须被忽略用于声明接收机能力的参数通常是可降级的;即，它们表示发送者可能的行为的上限。因此，发送方可以选择仅使用这些参数的较低/较小或相等值来设置其编码器。

7.2.3. Usage in Declarative Session Descriptions

When HEVC over RTP is offered with SDP in a declarative style, as in Real Time Streaming Protocol (RTSP) [RFC2326] or Session Announcement Protocol (SAP) [RFC2974], the following considerations are necessary.

o All parameters capable of indicating both bitstream properties and receiver capabilities are used to indicate only bitstream properties. For example, in this case, the parameter profiletier-level-id declares the values used by the bitstream, not the capabilities for receiving bitstreams. As a result, the following interpretation of the parameters MUST be used:

o 所有能够指示比特流属性和接收器能力的参数仅用于指示比特流属性。 例如，在这种情况下，参数profiletier-level-id声明比特流使用的值，而不是接收比特流的能力。 因此，必须使用以下对参数的解释：

- Declaring actual configuration or bitstream properties:
* profile-space
* profile-id
* tier-flag
* level-id
* interop-constraints
* profile-compatibility-indicator
* tx-mode
* sprop-vps
* sprop-sps
* sprop-pps
* sprop-max-don-diff
* sprop-depack-buf-nalus
* sprop-depack-buf-bytes
* sprop-segmentation-id
* sprop-spatial-segmentation-idc

- Not usable (when present, they MUST be ignored):
* max-lps
* max-lsr
* max-cpb
* max-dpb
* max-br
* max-tr
* max-tc
* max-fps
* max-recv-level-id
* depack-buf-cap
* sprop-sub-layer-id
* dec-parallel-cap
* include-dph


o A receiver of the SDP is required to support all parameters and values of the parameters provided; otherwise, the receiver MUST reject (RTSP) or not participate in (SAP) the session. It falls on the creator of the session to use values that are expected to be supported by the receiving application.

o SDP的接收器需要支持所提供参数的所有参数和值; 否则，接收方必须拒绝（RTSP）或不参与（SAP）会话。 它落在会话的创建者上，以使用预期由接收应用程序支持的值。

7.2.4. Considerations for Parameter Sets

When out-of-band transport of parameter sets is used, parameter sets MAY still be additionally transported in-band unless explicitly disallowed by an application, and some of these additional parameter sets may update some of the out-of-band transported parameter sets. Update of a parameter set refers to the sending of a parameter set of the same type using the same parameter set ID but with different values for at least one other parameter of the parameter set.

7.2.5. Dependency Signaling in Multi-Stream Mode

If MRST or MRMT is used, the rules on signaling media decoding dependency in SDP as defined in [RFC5583] apply. The rules on “hierarchical or layered encoding” with multicast in Section 5.7 of [RFC4566] do not apply. This means that the notation for Connection Data “c=” SHALL NOT be used with more than one address, i.e., the sub-field in the sub-field of the “c=” field, described in [RFC4566], must not be present. The order of session dependency is given from the RTP stream containing the lowest temporal sub-layer to the RTP stream containing the highest temporal sub-layer.

8 Use with Feedback Messages

The following subsections define the use of the Picture Loss Indication (PLI), Slice Lost Indication (SLI), Reference Picture Selection Indication (RPSI), and Full Intra Request (FIR) feedback messages with HEVC. The PLI, SLI, and RPSI messages are defined in [RFC4585], and the FIR message is defined in [RFC5104].

8.1. Picture Loss Indication (PLI)

As specified in RFC 4585, Section 6.3.1, the reception of a PLI by a media sender indicates “the loss of an undefined amount of coded video data belonging to one or more pictures”. Without having any specific knowledge of the setup of the bitstream (such as use and location of in-band parameter sets, non-IDR decoder refresh points, picture structures, and so forth), a reaction to the reception of an PLI by an HEVC sender SHOULD be to send an IDR picture and relevant parameter sets; potentially with sufficient redundancy so to ensure correct reception. However, sometimes information about the bitstream structure is known. For example, state could have been established outside of the mechanisms defined in this document that parameter sets are conveyed out of band only, and stay static for the duration of the session. In that case, it is obviously unnecessary to send them in-band as a result of the reception of a PLI. Other examples could be devised based on a priori knowledge of different aspects of the bitstream structure. In all cases, the timing and congestion control mechanisms of RFC 4585 MUST be observed.

8.2. Slice Loss Indication (SLI)

The SLI described in RFC 4585 can be used to indicate, to a sender, the loss of a number of Coded Tree Blocks (CTBs) in a CTB raster scan order of a picture. In the SLI’s Feedback Control Indication (FCI) field, the subfield “First” MUST be set to the CTB address of the first lost CTB. Note that the CTB address is in CTB-raster-scan order of a picture. For the first CTB of a slice segment, the CTB address is the value of slice_segment_address when present, or 0 when the value of first_slice_segment_in_pic_flag is equal to 1; both syntax elements are in the slice segment header. The subfield “Number” MUST be set to the number of consecutive lost CTBs, again in CTB-raster-scan order of a picture. Note that due to both the “First” and “Number” being counted in CTBs in CTB-raster-scan order, of a picture, not in tile-scan order (which is the bitstream order of CTBs), multiple SLI messages may be needed to report the loss of one tile covering multiple CTB rows but less wide than the picture.
The subfield “PictureID” MUST be set to the 6 least significant bits of a binary representation of the value of PicOrderCntVal, as defined in [HEVC], of the picture for which the lost CTBs are indicated. Note that for IDR pictures the syntax element slice_pic_order_cnt_lsb is not present, but then the value is inferred to be equal to 0.
As described in RFC 4585, an encoder in a media sender can use this information to “clean up” the corrupted picture by sending intra information, while observing the constraints described in RFC 4585, for example, with respect to congestion control. In many cases, error tracking is required to identify the corrupted region in the receiver’s state (reference pictures) because of error import in uncorrupted regions of the picture through motion compensation. Reference-picture selection can also be used to “clean up” the corrupted picture, which is usually more efficient and less likely to generate congestion than sending intra information.
In contrast to the video codecs contemplated in RFCs 4585 and 5104 [RFC5104], in HEVC, the “macroblock size” is not fixed to 16x16 luma samples, but is variable. That, however, does not create a conceptual difficulty with SLI, because the setting of the CTB size is a sequence-level functionality, and using a slice loss indication across CVS boundaries is meaningless as there is no prediction across sequence boundaries. However, a proper use of SLI messages is not as straightforward as it was with older, fixed-macroblock-sized video codecs, as the state of the sequence parameter set (where the CTB size is located) has to be taken into account when interpreting the “First” subfield in the FCI.

8.3. Reference Picture Selection Indication (RPSI)
Feedback-based reference picture selection has been shown as a powerful tool to stop temporal error propagation for improved error resilience [Girod99][Wang05]. In one approach, the decoder side tracks errors in the decoded pictures and informs the encoder side that a particular picture that has been decoded relatively earlier is correct and still present in the decoded picture buffer; it requests the encoder to use that correct picture-availability information when encoding the next picture, so to stop further temporal error propagation. For this approach, the decoder side should use the RPSI feedback message.
Encoders can encode some long-term reference pictures as specified in H.264 or HEVC for purposes described in the previous paragraph without the need of a huge decoded picture buffer. As shown in [Wang05], with a flexible reference picture management scheme, as in H.264 and HEVC, even a decoded picture buffer size of two picture storage buffers would work for the approach described in the previous paragraph.
The field “Native RPSI bit string defined per codec” is a base16 [RFC4648] representation of the 8 bits consisting of the 2 most significant bits equal to 0 and 6 bits of nuh_layer_id, as defined in [HEVC], followed by the 32 bits representing the value of the PicOrderCntVal (in network byte order), as defined in [HEVC], for the picture that is indicated by the RPSI feedback message.
The use of the RPSI feedback message as positive acknowledgement with HEVC is deprecated. In other words, the RPSI feedback message MUST only be used as a reference picture selection request, such that it can also be used in multicast.

8.4. Full Intra Request (FIR)
The purpose of the FIR message is to force an encoder to send an independent decoder refresh point as soon as possible (observing, for example, the congestion-control-related constraints set out in RFC 5104).
Upon reception of a FIR, a sender MUST send an IDR picture. Parameter sets MUST also be sent, except when there is a priori knowledge that the parameter sets have been correctly established. A typical example for that is an understanding between sender and receiver, established by means outside this document, that parameter sets are exclusively sent out-of-band.

FIR消息的目的是强制编码器尽快发送独立的解码器刷新点（例如，观察RFC 5104中规定的与拥塞控制相关的约束）。

9 Security Considerations

The scope of this Security Considerations section is limited to the payload format itself and to one feature of HEVC that may pose a particularly serious security risk if implemented naively. The payload format, in isolation, does not form a complete system. Implementers are advised to read and understand relevant security- related documents, especially those pertaining to RTP (see the Security Considerations section in [RFC3550]), and the security of the call-control stack chosen (that may make use of the media type registration of this memo). Implementers should also consider known security vulnerabilities of video coding and decoding implementations in general and avoid those.
Within this RTP payload format, and with the exception of the user data SEI message as described below, no security threats other than those common to RTP payload formats are known. In other words, neither the various media-plane-based mechanisms, nor the signaling part of this memo, seems to pose a security risk beyond those common to all RTP-based systems.
RTP packets using the payload format defined in this specification are subject to the security considerations discussed in the RTP specification [RFC3550], and in any applicable RTP profile such as RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711], or RTP/SAVPF [RFC5124]. However, as “Securing the RTP Framework: Why RTP Does Not Mandate a Single Media Security Solution” [RFC7202] discusses, it is not an RTP payload format’s responsibility to discuss or mandate what solutions are used to meet the basic security goals like confidentiality, integrity and source authenticity for RTP in general. This responsibility lays on anyone using RTP in an application. They can find guidance on available security mechanisms and important considerations in “Options for Securing RTP Sessions” [RFC7201]. Applications SHOULD use one or more appropriate strong security mechanisms. The rest of this section discusses the security impacting properties of the payload format itself.

Because the data compression used with this payload format is applied end-to-end, any encryption needs to be performed after compression. A potential denial-of-service threat exists for data encodings using compression techniques that have non-uniform receiver-end computational load. The attacker can inject pathological datagrams into the bitstream that are complex to decode and that cause the receiver to be overloaded. H.265 is particularly vulnerable to such attacks, as it is extremely simple to generate datagrams containing NAL units that affect the decoding process of many future NAL units. Therefore, the usage of data origin authentication and data integrity protection of at least the RTP packet is RECOMMENDED, for example, with SRTP [RFC3711].
Like [H.264], HEVC includes a user data Supplemental Enhancement Information (SEI) message. This SEI message allows inclusion of an arbitrary bitstring into the video bitstream. Such a bitstring could include JavaScript, machine code, and other active content. HEVC leaves the handling of this SEI message to the receiving system. In order to avoid harmful side effects of the user data SEI message, decoder implementations cannot naively trust its content. For example, it would be a bad and insecure implementation practice to forward any JavaScript a decoder implementation detects to a web browser. The safest way to deal with user data SEI messages is to simply discard them, but that can have negative side effects on the quality of experience by the user.
End-to-end security with authentication, integrity, or confidentiality protection will prevent a MANE from performing media- aware operations other than discarding complete packets. In the case of confidentiality protection, it will even be prevented from discarding packets in a media-aware way. To be allowed to perform such operations, a MANE is required to be a trusted entity that is included in the security context establishment.

10 Congestion Control

Congestion control for RTP SHALL be used in accordance with RTP [RFC3550] and with any applicable RTP profile, e.g., AVP [RFC3551]. If best-effort service is being used, an additional requirement is that users of this payload format MUST monitor packet loss to ensure that the packet loss rate is within an acceptable range. Packet loss is considered acceptable if a TCP flow across the same network path, and experiencing the same network conditions, would achieve an average throughput, measured on a reasonable timescale, that is not less than all RTP streams combined is achieving. This condition can be satisfied by implementing congestion-control mechanisms to adapt the transmission rate, the number of layers subscribed for a layered multicast session, or by arranging for a receiver to leave the session if the loss rate is unacceptably high.
The bitrate adaptation necessary for obeying the congestion control principle is easily achievable when real-time encoding is used, for example, by adequately tuning the quantization parameter.
However, when pre-encoded content is being transmitted, bandwidth adaptation requires the pre-coded bitstream to be tailored for such adaptivity. The key mechanism available in HEVC is temporal scalability. A media sender can remove NAL units belonging to higher temporal sub-layers (i.e., those NAL units with a high value of TID) until the sending bitrate drops to an acceptable range. HEVC contains mechanisms that allow the lightweight identification of switching points in temporal enhancement layers, as discussed in Section 1.1.2 of this memo. An HEVC media sender can send packets belonging to NAL units of temporal enhancement layers starting from these switching points to probe for available bandwidth and to utilized bandwidth that has been shown to be available.
Above mechanisms generally work within a defined profile and level and, therefore, no renegotiation of the channel is required. Only when non-downgradable parameters (such as profile) are required to be changed does it become necessary to terminate and restart the RTP stream(s). This may be accomplished by using different RTP payload types.
MANEs MAY remove certain unusable packets from the RTP stream when that RTP stream was damaged due to previous packet losses. This can help reduce the network load in certain special cases. For example, MANES can remove those FUs where the leading FUs belonging to the same NAL unit have been lost or those dependent slice segments when the leading slice segments belonging to the same slice have been lost, because the trailing FUs or dependent slice segments are meaningless to most decoders. MANES can also remove higher temporal scalable layers if the outbound transmission (from the MANE’s viewpoint) experiences congestion.

RTP的拥塞控制应根据RTP [RFC3550]和任何适用的RTP配置文件使用，例如AVP [RFC3551]。如果使用尽力而为服务，则另外的要求是该有效载荷格式的用户必须监视分组丢失以确保分组丢失率在可接受的范围内。如果TCP流经相同的网络路径并且经历相同的网络条件，则在合理的时间尺度上测量的平均吞吐量不小于组合的所有RTP流实现的数据包丢失被认为是可接受的。通过实现拥塞控制机制来调整传输速率，为分层多播会话预订的层数，或者如果丢失率高得无法接受，则通过安排接收器离开会话，可以满足该条件。

• 1
点赞
• 0
评论
• 0
收藏
• 扫一扫，分享海报

12-03 438
07-16 3416
07-25 8848
04-25 1493
02-16 5025
03-18 5717