0x00 WebRTC RTP Header Extension 格式说明
在 RTP协议 rfc3550 section 3.5.1 中定义 RTP header extension 结构如下图所示:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| defined by profile | length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| header extension |
| .... |
如果 RTP 标准头部 X 位为1,就表示CSRC后面还有一些额外的 RTP 扩展头,上面的定义中允许使用 16-bit 长度作为 identifier,16-bit 长度作为 header extension 长度说明。
这种定义方式存在两个缺点:① 一个 RTP packet 只能携带一个 header extension; ② 其次,没有给出如何分配 16-bit header extension identifier 以避免冲突。
基于以上两个缺点,rfc5285 对 header extension 做了拓展,支持两种类型的拓展头 One-byte Header 和 Two-byte Header
One-byte Header
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0xBE | 0xDE | length =3 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID | L=0 | data | ID | L=1 | data...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
...data | 0 (pad) | 0 (pad) | ID | L=3 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
RTP头后的第一个 16 为固定为 0XBEDE 标志,意味着这是一个 one-byte 扩展,length = 3 说明 header extension 的总长度为 3 * 32bit = 96bit = 12byte,每个扩展头首先以一个byte开始,前4位是这个扩展头的 ID, 后四位是 data 的长度 -1,譬如说 L=0 意味着后面有1个 byte 的 data,同理第二个扩展头的 L=1 说明后面还有 2 个 byte 的 data,但是注意,其后没有紧跟第三个扩展头,而是添加了 2 个byte大小的全 0 的 data,这是为了作填充对齐,因为扩展头是以为 32bit 作填充对齐的
Two-bytes Header
扩展头为two-byte的情况下, RTP头后的第一个16为如下所示, 一个0x100 + appbits, appbits可以用来填充应用层级别的数据
0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0x100 |appbits|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
一个例子如下, 可以看到开头为 0x100 + 0x0, 接下来的为length=3表示接下来有3个32bit长度,接下来的就是扩展头和数据,扩展头除了 ID 和 L 相对于 one-byte header 从 4bits 变成了 8bits ,参考 rfc8285 4.3 节 two-byte 中 L 表示了真是的长度,不同于 one-byte 中需要进行 +1 计算。
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0x10 | 0x00 | length=3 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID | L=0 | ID | L=1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data | 0 (pad) | ID | L=4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- 注意:
- rfc8285 中同一个 RTPStream 中允许 one-byte header extension 和 two-byte header extension 同时出现,需要 sdp中声明 ‘a=extmap-allow-mixed’
- 一个 rtp packet 中 只能使用 one-byte header extension 或 two-byte header extension 其中一种;
0x01 RTP header extension SDP 信息说明
在 SDP 信息中 ‘a=extmap’ 用来描述 RTP header extension。格式如下
a=extmap:<value>["/"<direction>] <URI> <extensionattributes>
其中:
- 为 header extension 的 ID 信息,在 one-byte header 中,0保留作为 padding 数据,15保留作为停止标识;
- 有效类型为 “sendonly”, “recvonly”, “sendrecv”, “inactive”,表示传输方向
- 为 header extension 的描述 URI
- 在 IANA 注册的 URI 格式为 urn:ietf:params:rtp-hdrext:avt-example-metadata
- 没有在 IANA 注册的 URI 格式为 http://example.com/082005/ext.htm#example-metadata
示例:
a=extmap:1 http://example.com/082005/ext.htm#ttime
a=extmap:2/sendrecv http://example.com/082005/ext.htm#xmeta
0x02 WebRTC 支持的 RTP Header Extension 说明
下面列举了 WebRTC 中常见的几种 header extension 的 URI:
URI | 链接 |
---|---|
urn:ietf:params:rtp-hdrext:sdes:mid | https://tools.ietf.org/html/rfc7941 |
urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id | https://tools.ietf.org/html/draft-ietf-avtext-rid-09 |
urn:ietf:params:rtp-hdrext:sdes:repaired-rtp-stream-id | https://tools.ietf.org/html/draft-ietf-avtext-rid-09 |
http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time | https://webrtc.googlesource.com/src/+/refs/heads/master/docs/native-code/rtp-hdrext/abs-send-time |
http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01 | https://tools.ietf.org/html/draft-holmer-rmcat-transport-wide-cc-extensions-01 |
urn:ietf:params:rtp-hdrext:framemarking | https://tools.ietf.org/html/draft-ietf-avtext-framemarking-07 |
urn:ietf:params:rtp-hdrext:ssrc-audio-level | https://tools.ietf.org/html/rfc6464 |
urn:3gpp:video-orientation | http://www.3gpp.org/ftp/Specs/html-info/26114.htm |
urn:ietf:params:rtp-hdrext:toffset | https://tools.ietf.org/html/rfc5450 |
① urn:ietf:params:rtp-hdrext:sdes:mid
MID: This is a media description identifier that matches the value of the SDP [RFC4566] a=mid attribute [RFC5888], to associate RTP streams multiplexed on the same transport with their respective SDP media description.
在 unified SDP 描述中 ‘a=mid’ 是每个 audio/video line 的必要元素,这个 header extension 将 SDP 中 ‘a=mid’ 后信息保存,用于标识一个 RTP packet 的 media 信息,可以作为一个 media 的唯一标识
② urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id
在 RFC7656 中定义了两个概念 ‘Media Source’ 和 ‘RTP Stream’ 。其中:
- Media Source 可以等同于 WebRTC 中 Track 的概念,在 SDP 描述中可以使用 mid 作为唯一标识;
- RTP Stream 是 RTP 流传输的最小流单位,例如在 Simulcast 或 SVC 场景中,一个 Media Source 中包含多个 RTP Stream,这时的SDP中 使用 ‘a=rid’ 来描述每个 RTP Stream
https://tools.ietf.org/html/draft-ietf-avtext-rid-09 草案中声明了两种新的 RTCP 类型的 SDES 信息,同时基于 RFC7941 的方法,可以把这个信息放入 RTP header extension 中。
③ urn:ietf:params:rtp-hdrext:sdes:repaired-rtp-stream-id
同 ②,用于声明重传时使用的 rid 标识。
④ http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
Wire format: 1-byte extension, 3 bytes of data. total 4 bytes extra per packet (plus shared 4 bytes for all extensions present: 2 byte magic word 0xBEDE, 2 byte # of extensions). Will in practice replace the “toffset” extension so we should see no long term increase in traffic as a result.
Encoding: Timestamp is in seconds, 24 bit 6.18 fixed point, yielding 64s wraparound and 3.8us resolution (one increment for each 477 bytes going out on a 1Gbps interface).
Relation to NTP timestamps: abs_send_time_24 = (ntp_timestamp_64 >> 14) & 0x00ffffff ; NTP timestamp is 32 bits for whole seconds, 32 bits fraction of second.
abs-send-time 为 一个 3 bytes 的时间数据。
mediasoup 中 将 毫秒转换为 abs-send-time 的方法为:
static uint32_t TimeMsToAbsSendTime(uint64_t ms)
{
return static_cast<uint32_t>(((ms << 18) + 500) / 1000) & 0x00FFFFFF;
}
GCC模块的 REMB 计算需要 RTP 报文扩展头部 abs-send-time 的支持,用以记录 RTP 数据包在发送端的绝对发送时间
⑤ http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01
IETF成立了RMCAT(RTP Media Congestion Avoidance Techniques)工作组,制定在RTP协议之上的拥塞控制机制。
下图为 holmer-rmcat-transport-wide-cc-extensions 的数据组成:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0xBE | 0xDE | length=1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID | L=1 |transport-wide sequence number | zero padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
可以看出,有效数据只有 16 bit,它记录了一个 sequence number,称为 transport-wide sequence number。
发送端在发送RTP数据包时,在RTP头部扩展中设置传输层序列号TransportSequenceNumber;数据包到达接收端后记录该序列号和包到达时间,然后接收端基于此构造TransportCC报文返回到发送端;发送端解析该报文,并执行Sendside-BWE算法,计算得到基于延迟的码率;最终Ar和基于丢包率的码率进行比较得到最终目标码率。
⑥ urn:ietf:params:rtp-hdrext:framemarking
WebRTC 中 RTP payload 部分通过 SRTP 进行加密,这样导致 RTP packet 在经过交换节点或转发节点时,有些场景下需要知道当前 RTP packet 的编码信息,例如传输优先级的优化,这时 urn:ietf:params:rtp-hdrext:framemarking 可以帮助转发或交换节点解决这个问题,因为 RTP header 部分在 SRTP 中不被加密。
不同的编码通常定义不同的标识信息,以 H.264 为例,数据结构定义为:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID=2 | L=2 |S|E|I|D|B| TID |0|0|0|0|0|0|0|0| TL0PICIDX |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The following information are extracted from the media payload and sent in the Frame Marking RTP header extension.
S: Start of Frame (1 bit) - MUST be 1 in the first packet in a frame within a layer; otherwise MUST be 0.
E: End of Frame (1 bit) - MUST be 1 in the last packet in a frame within a layer; otherwise MUST be 0.
I: Independent Frame (1 bit) - MUST be 1 for frames that can be decoded independent of temporally prior frames, e.g. intra-frame, VPX keyframe, H.264 IDR [RFC6184], H.265 IDR/CRA/BLA/RAP [RFC7798]; otherwise MUST be 0. Note that this bit only signals temporal independence, so it can be 1 in spatial or quality enhancement layers that depend on temporally co-located layers but not temporally prior frames.
D: Discardable Frame (1 bit) - MUST be 1 for frames that can be discarded, and still provide a decodable media stream; otherwise MUST be 0.
B: Base Layer Sync (1 bit) - MUST be 1 if this frame only depends on the base layer; otherwise MUST be 0. If no scalability is used, this MUST be 0.
TID: Temporal ID (3 bits) - The base temporal layer starts with 0, and increases with 1 for each higher temporal layer/sub-layer. If no scalability is used, this MUST be 0.
LID: Layer ID (8 bits) - Identifies the spatial and quality layer encoded. If no scalability is used, this MUST be 0 or omitted. When omitted, TL0PICIDX MUST also be omitted.
TL0PICIDX: Temporal Layer 0 Picture Index (8 bits) - Running index of base temporal layer 0 frames when TID is 0. When TID is not 0, this indicates a dependency on the given index. If no scalability is used, this MUST be 0 or omitted. When omitted, LID MUST also be omitted.
⑦ urn:ietf:params:rtp-hdrext:ssrc-audio-level
结构比较简单,使用RFC6464定义的针对audio的扩展首部,用来调节音量,比如在大型会议中有多个音频流就可以用它来调整音频混流的策略
0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID | len=0 |V| level |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
⑧ urn:3gpp:video-orientation
Coordination of video orientation (CVO)
3GPP TS 26.114 section 7.4.5
定义文件参考 26114-g52.doc
0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID | len=0 |R0R1F C |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
With the following definitions:
C = Camera: indicates the direction of the camera used for this video stream. It can be used by the MTSI client in receiver to e.g. display the received video differently depending on the source camera.
0: Front-facing camera, facing the user. If camera direction is unknown by the sending MTSI client in the terminal then this is the default value used.
1: Back-facing camera, facing away from the user.
F = Flip: indicates a horizontal (left-right flip) mirror operation on the video as sent on the link.
0: No flip operation. If the sending MTSI client in terminal does not know if a horizontal mirror operation is necessary, then this is the default value used.
1: Horizontal flip operation
R1, R0 = Rotation: indicates the rotation of the video as transmitted on the link. The receiver should rotate the video to compensate that rotation. E.g. a 90° Counter Clockwise rotation should be compensated by the receiver with a 90° Clockwise rotation prior to displaying.
⑨ urn:ietf:params:rtp-hdrext:toffset
传输时间偏移 (Transmission Time Offset),offset 为 RTP packet 中 timestamp 与 实际发送时间的偏移。一些前向编码 packet 所携带的 timestamp 和实际的发送时间有一定偏移,加入偏移可以更精确的计算传输耗时。
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID | len=2 | transmission offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0x03 参考标准
https://tools.ietf.org/html/rfc3550
https://tools.ietf.org/html/rfc8285
https://tools.ietf.org/html/rfc7941
https://tools.ietf.org/html/rfc7656
https://tools.ietf.org/html/rfc7742
0x04 参考资料
https://www.cnblogs.com/ishen/p/12050077.html
https://www.jianshu.com/p/ab32a8a3552f
https://blog.csdn.net/linux_vae/article/details/100558804