基于libRTMP的流媒体直播之音频推送

最新推荐文章于 2024-08-16 08:56:45 发布

花花永不落幕

最新推荐文章于 2024-08-16 08:56:45 发布

阅读量5.3k

点赞数 2

本文链接：https://blog.csdn.net/u014287775/article/details/53505525

版权

不论像RTMP服务器推送视频还是音频，都需要按照FLV格式进行封包，然后调用librtmp接口函数进行发送。透过对FLV文件协议的理解，我们在向RTMP服务器发送yin数据包之前，需要需要首先推送一个音频 Tag [Audio Sequence Header] 以下简称“音频同步包”，或者视频 Tag [AVC Sequence Header] 以下简称“视频同步包”。现在我们先来介绍一下这个Tag包：

开头两个字节,表示音频的相关参数，如AAC音频包的头两个字节 AF 00, 前四位 A表示音频类型为AAC，F中，前两位表示采样率，j紧跟着的一位表示采样精度，最后一位表示声道类型（单声道还是双声道）。第二个字节0x00表示传输的音频数据为Audio Sequence Header, 如果时0x01表示发送的数据为音频data数据。

FLV中的格式介绍如下：

对于AAC音频，在这两个字节的后面，还会跟随两个字节AAC特有的头，AudioSpecificConfig，详细指定了AAC音频的参数信息。

前5位，表示AAC的类型（AAC_LC等），紧接着的4个字节表示AAC音频的采样率，紧接着4个位表示音频输出声道信息，最后三位，为三个标志位。例如：例如0X13 0X90， 0x13 0x90（1001110010000）表示 ObjectProfile=2， AAC-LC，SamplingFrequencyIndex=7，ChannelConfiguration=声道2

具体如下：

详细介绍：

音频类型（5bit）:分两种，如果是MPEG-4，支持的种类比较多

1: AAC Main
2: AAC LC (Low Complexity)
3: AAC SSR (Scalable Sample Rate)
4: AAC LTP (Long Term Prediction)
5: SBR (Spectral Band Replication)
6: AAC Scalable
7: TwinVQ
8: CELP (Code Excited Linear Prediction)
9: HXVC (Harmonic Vector eXcitation Coding)
10: Reserved
11: Reserved
12: TTSI (Text-To-Speech Interface)
13: Main Synthesis
14: Wavetable Synthesis
15: General MIDI
16: Algorithmic Synthesis and Audio Effects
17: ER (Error Resilient) AAC LC
18: Reserved
19: ER AAC LTP
20: ER AAC Scalable
21: ER TwinVQ
22: ER BSAC (Bit-Sliced Arithmetic Coding)
23: ER AAC LD (Low Delay)
24: ER CELP
25: ER HVXC
26: ER HILN (Harmonic and Individual Lines plus Noise)
27: ER Parametric
28: SSC (SinuSoidal Coding)
29: PS (Parametric Stereo)
30: MPEG Surround

对于MPEG-2，支持的种类较少，只支持MPEG-4中的前三种类型的AAC

音频采样率（4bit）：可以表示16中采样率，每个值对应一种采样率，例如，如果该四个位表示的值为0，对应的采样率为96000Hz

具体支持的音频采样率类型及对应关系如下：

0 : 96000Hz

1 : 88200Hz

2 : 64000Hz

3 : 48000Hz

4 : 44100Hz

5 : 32000Hz

6 : 24000Hz

7 : 22050Hz

8 : 16000Hz

9 : 12000Hz

10: 11025Hz

11: 8000Hz

12: 7350Hz

13: Reserved

14: Reserved

15: Reserved

声道信息（4bit）:该值对应的声道信息关系如下：该值对应的声道信息关系如下：

0：Define in AOT Specifc Config

1: 1 channel: front_center

2: 2 channel: front-left, front-right

3: 3 channel: front_center,front-left, front-right

4: 4 channel: front_center,front-left, front-right, back_center

5: 5 channel: front_center,front-left, front-right, back_left, back_right

6: 6 channel: front_center,front-left, front-right, back_left, back_right,LFE_channel

7: 8 channel: front_center,front-left, front-right, side_left, side_right, back_left, back_right,LFE_channel

8-15：Reserved

当Audio Sequence Header发送之后，紧接着发送音频数据，每帧数据发送时，都需要先添加两个字节的音频Header,例如AAC的

0XAF 0X01，记住，第二个字节为0x01，表示发送的是音频data,而非音频Audio Sequence Header。如果是非AAC音频数据，直接将data数据跟在Header两个字节后面，封装成一组音频包，调用librtmp接口函数，发送即可；如果是AAC音频，则要先把data数据的前7个字节(或者9个字节)去掉，然后在把剩下的数据与两个字节的Header组成一组音频包发送出去；之所以要去掉7个或9个字节，因为每帧AAC数据，都包含了7个字节或9个字节的头数据，因为开始已经发送了Audio Sequence Header以及AAC自己的

AudioSpecificConfig数据，不需要再每帧数据都发送AAC的头数据，所以去掉7个或9个字节的头数据，直接发送音频数据就可以了。那为什么有7个字节和9个字节之分呢，因为AAC帧头大小可能是7个字节，也可能是9个字节，具体情况，下面来详细介绍下AAC7或9个字节的数据：(考虑到有些大牛的博客已经有过详细的介绍，这里就直接贴用了)

直播的视频用H264，音频用AAC，从FAAC里面压缩出来的一帧音频数据，要经过简单处理才能打包用RTMP协议发送到FMS上，包括保存成FLV文件，都要稍微处理一下，主要是把AAC的帧头去掉，并提取出相应的信息。

1024字节的G.711A数据，AAC一般也就300多个字节。

可以把FAAC压缩出来的帧直接保存成AAC文件，用windows7自带的播放器可以播放的，方便测试。

AAC的帧头一般7个字节，或者包含CRC校验的话9个字节，这里面包括了声音的相关参数。

结构如下：

Structure

AAAAAAAA AAAABCCD EEFFFFGH HHIJKLMM MMMMMMMM MMMOOOOO OOOOOOPP (QQQQQQQQ QQQQQQQQ)

Header consists of 7 or 9 bytes (without or with CRC).

RTMP直播到FMS中的AAC音频头 AAC Frame Header (转) - niulei20012001 - niulei20012001的博客

Letter	Length (bits)	Description
A	12	syncword 0xFFF, all bits must be 1
B	1	MPEG Version: 0 for MPEG-4, 1 for MPEG-2
C	2	Layer: always 0
D	1	protection absent, Warning, set to 1 if there is no CRC and 0 if there is CRC
E	2	profile, the MPEG-4 Audio Object Type minus 1
F	4	MPEG-4 Sampling Frequency Index (15 is forbidden)
G	1	private stream, set to 0 when encoding, ignore when decoding
H	3	MPEG-4 Channel Configuration (in the case of 0, the channel configuration is sent via an inband PCE)
I	1	originality, set to 0 when encoding, ignore when decoding
J	1	home, set to 0 when encoding, ignore when decoding
K	1	copyrighted stream, set to 0 when encoding, ignore when decoding
L	1	copyright start, set to 0 when encoding, ignore when decoding
M	13	frame length, this value must include 7 or 9 bytes of header length: FrameLength = (ProtectionAbsent == 1 ? 7 : 9) + size(AACFrame)
O	11	Buffer fullness
P	2	Number of AAC frames (RDBs) in ADTS frame minus 1, for maximum compatibility always use 1 AAC frame per ADTS frame
Q	16	CRC if protection absent is 0

其中最重要的就是E，F，H。

E就是类型了

0: AAC Main
1: AAC LC (Low Complexity)
2: AAC SSR (Scalable Sample Rate)
3: AAC LTP (Long Term Prediction)

F就是采样频率

0: 96000 Hz

1: 88200 Hz
2: 64000 Hz
3: 48000 Hz
4: 44100 Hz
5: 32000 Hz
6: 24000 Hz
7: 22050 Hz
8: 16000 Hz
9: 12000 Hz
10: 11025 Hz
11: 8000 Hz
12: 7350 Hz
H就是声道

1: 1 channel: front-center
2: 2 channels: front-left, front-right
3: 3 channels: front-center, front-left, front-right
4: 4 channels: front-center, front-left, front-right, back-center
5: 5 channels: front-center, front-left, front-right, back-left, back-right
6: 6 channels: front-center, front-left, front-right, back-left, back-right, LFE-channel
7: 8 channels: front-center, front-left, front-right, side-left, side-right, back-left, back-right, LFE-channel