FFmpeg rmvb demuxer中COOK 音频解析

最新推荐文章于 2023-12-20 08:45:00 发布

crazy0126

最新推荐文章于 2023-12-20 08:45:00 发布

阅读量1.9k

点赞数

分类专栏： FFmpeg 文章标签： FFmpeg rmvb demuxer audio COOK

本文链接：https://blog.csdn.net/crazy0126/article/details/21154849

版权

FFmpeg 专栏收录该内容

3 篇文章

订阅专栏

收集的关于COOK codec的知识，对于理解ffmpeg rmvb demuxer 中解析音频packet部分很有帮助。

对应的代码：/libavformat/rmdec.c的函数ff_rm_parse_packet中音频解析部分。

首先说一下自己的理解：

每个音频帧就是一个sub packet.

多个sub packet组成一个逻辑单元packet.

sub_packet_h 个 packet组成一个'scrambling unit'..

最终通过read_packet读取，并最终送给decoder进行解码的就是sub packet.

为了取得sub packet，需要通过一个数学公式确定每一帧的位置，然后进行读取。

原文地址： http://www.rockbox.org/wiki/CookCodec

From rm file header, the following are some of parameters of interest to an audio decoder :

avg_packet_size
sub_packet_size
sub_packet_h
block_align

The mystery of frames, packets and sub-packets

In cook, a packet is a logical unit for storing audio frames. One packet typically contains multiple mixed frames, which rm calls sub_packets.

For almost any rm audio file, block_align == avg_packet_size, which is also synonymous to frame_size in rm header. The 'regular' audio frame, that is an audio buffer which could be sent to a decoder, is called sub_packet. In this context then, rm's frame_size is the size of one logical unit of multiple frames, and sub_packet_size is the size of a regular audio frame.

block_align

As stated in the previous paragraph, in a rm file, the value of block_align is equal to frame_size or avg_packet_size which is the size of one unit of packed frames. That's not the exact case in cook, however. For cook, block_align == sub_packet_size, which is the size of an actual audio frame. This has to be done manually though, an rm header just provides the values of the parameters and a parser has to handle the rest. This means that the parser would check a file to see if it contains cook audio, and then assign the value of sub_packet_size to block_align.

sub_packet_h

This is described in ffmpeg as a 'descrambling parameter'. After packing the frames (sub_packets) into packets, the packets are further packed into into scrambling units, each containing a sub_packet_h multiple of packets not sub_packets. So for a parser to construct proper audio frames that the decoder could handle, it should first loop through the packets 'descrambling them'. For this process, the parser has to determine the position of each audio frame in the scrambling unit according to a crazy mathematical formula. Luckily the ffmpeg developers were capable of figuring out this formula, which is :

sps*(h*x+((h+1)/2)*(y&1)+(y>>1))

sps = sub_packet_size;
h = sub_packet_h;
x = the position of the current frame in its parent packet;
y = sub_packet_count; a sub_packet counter for each scrambling unit.

After constructing one scrambling unit, audio frames are then sent to the decoder. The decoder takes in an input buffer of uint8_t* and produces an output buffer of int16_t* .