MPlayer 源码框架解析

最新推荐文章于 2013-10-07 23:05:58 发布

咕唧咕唧shuboLK

最新推荐文章于 2013-10-07 23:05:58 发布

阅读量4.6k

点赞数 1

分类专栏： MultiMedia

MultiMedia 专栏收录该内容

16 篇文章 1 订阅

订阅专栏

DOCS/general.txt

So, I'll describe how this stuff works.（OK，我来解释这b是如何工作的）

The main modules:

1. stream.c: this is the input layer, this reads the input media (file, stdin,
   vcd, dvd, network etc). （这是**文件级**输入层，从各式各样的地方读取媒体）
   what it has to know: appropriate buffering by sector, seek, skip functions,
   reading by bytes, or blocks with any size.

（从这里要获取一些必须知道的信息）

The stream_t (stream.h) structure describes the input stream, file/device.
（这个文件里的这个结构描述了输入流，文件和设备）

There is a stream cache layer (cache2.c), it's a wrapper for the stream API.

   It does fork(), then emulates stream driver in the parent process,
   and stream user in the child process, while proxying between them using
   preallocated big memory chunk for FIFO buffer.
   （这里有一个缓冲层，它是流API的封装，它fork（）一下，
   然后在父进程里模拟流驱动, 在子进程里模拟流用户，
   用一个已分配的内存块当作FIFO的缓存来交换数据）

2. demuxer.c: this does the demultiplexing (separating) of the input to
audio, video or dvdsub channels, and their reading by buffered packages.
（这是分离器，有三个通道audio、video、dvdsub）

   The demuxer.c is basically a framework, which is the same for all the input formats,
   and there are parsers for each of them (mpeg-es, mpeg-ps, avi, avi-ni, asf),
   these are in the demux_*.c files.
   （demuxer.c是一个框架，它对所有输入格式都一样，而不同的格式处理则在相应的demux_*.c文件里）

   The structure is the demuxer_t. There is only one demuxer.
   （该结构是demuxer_t，而且只有一个分离器）

Safirst:

我已经看到了至少三个分离器，分别为vd ad sd.

程序大致流程如下：

i 　从文件名open_stream得到视频流vs

ii 　然后demux_open打开vs流，这有两个目的：

a、前部分：如果有as或者ss的话，利用open_stream打开它们，

　　记住，stream对应文件概念，而不是真正文件内部的数据流。

　　如果有as，那是对应着有一个音频文件；

　　如果有ss，那是对应着有一个字幕文件，也许是srt.

所以，大概的情况应该是，一般只有一个vs，靠vs来操作一切，

这并不意味着vs里只有视频数据，而没有音频和字幕这些，

如果只有vs，那么它只是意味着当前播放的是一个单纯的文件，而没有附属品。

b、后部分：调用demux_open_stream（该函数相当复杂，它返回后test文件就几乎完全输出了）

来分别打开这三个流而得到三个分离器vd ad sd

c、末尾：该函数最后混合了这三个分离器为一个，作为返回，

难道这就是only one demuxer的真正含义？Maybe

*混合*三个通道：

这里混合的不是数据级的，而是文件级的，它是从三个不同的文件vs as ss来混合的，

所以还是那句话，没有as和ss并不意味着没有声音和字幕，只是意味着它不是外挂声音和字幕。

一个vs就可以包含所有的内容，如果是已经整合了的话。

2.a. demux_packet_t, that is DP.（解析包结构，它叫做DP，D是解，P是包）
    Contains one chunk(块) (avi) or packet（包） (asf,mpg).
    They are stored in memory as in linked list, cause of their different size.
    （存在链表里，因为大小不同）

2.b demux_stream_t, that is DS.（解析流结构，他叫做DS，D是解，S是流）
    Every channel (a/v/s) has one.
    This contains the packets for the stream (see 2.a).
    For now, there can be 3 for each demuxer :
- audio (d_audio)
- video (d_video)
- DVD subtitle (d_dvdsub) (目前只有3个解析流结构)

2.c. stream header.
There are 2 types (for now): sh_audio_t and sh_video_t(目前只有两种流头结构，音频和视频)
This contains every parameter essential for decoding,
such as input/output buffers, chosen codec, fps, etc.

There are each for every stream in the file.（每个流一个头结构）
At least one for video, if sound is present then another
but if there are more, then there'll be one structure for each.
（至少一个视频头结构，这里隐晦地表达了，文件第一个对应的东西是视频，

而不是其他，所以一切是以视频为先手的，这就跟demux_open里诡异的调用一致了，

31 ret->type = ret->file_format = DEMUXER_TYPE_DEMUXERS;
32 // Video is the most important :-) 这一句更是体现了核心，vs的类型作为了文件的类型！

（demux_demuxer.c）
有音频就再加一个，反正每个流对应一个头结构）

These are filled(流头结构被填充的两种方式)
第一、according to the header (avi/asf),（文件自己带头）
第二、or demux_mpg.c does it (mpg) if it founds a new stream. （或者由demux_mpg.c来做）

If a new stream is found,（找到流会显示）
the ====> Found audio/video stream: <id> messages is displayed.

The chosen stream header and its demuxer are connected together
(ds->sh and sh->ds) to simplify the usage. So it's enough to pass the
ds or the sh, depending on the function.（为了简化应用，解析流和流头结构可以互相引用）

For example: we have an asf file, 6 streams inside it, 1 audio, 5
video. During the reading of the header, 6 sh structs are created, 1
audio and 5 video. （一个6个流(1个音频，5个视频)的asf，会产生6个头结构）

When it starts reading the packet, it chooses the stream for the first found audio & video packet,
（开始读取数据包时，它利用找到的第一个音频或者视频包来选择流，然后把流的头结构指向它们）
and sets the sh pointers of d_audio and d_video according to them.
So later it reads only these streams. （后来它们就只从这些流里读取）

Of course the user can force choosing a specific stream with
-vid and -aid switches. （当然用户可以强制指定特定的流）

A good example for this is the DVD, where the english stream is not
always the first, so every VOB has different language :)
That's when we have to use for example the -aid 128 switch.（这是在DVD中强制指定英语语言的例子）

（这个流头结构的读取是如何进行的呢？）

Now, how this reading works？

- demuxer.c/demux_read_data() is called, it gets how many bytes,and where (memory address),

would we like to read, and from which DS. The codecs call this.

Safirst: demuxer.c/demux_read_data()这个函数来做这件事情，不过它是由编解码器来调用的。

这个函数可以完全看得懂，它在实现上

使用memcpy来从ds->buffer的当前位置pos上拷贝长度len的数据到mem，

bytes既作为已拷贝数据大小返回，也作为mem的基址使用。

memcpy(mem + bytes, &ds->buffer[ds->buffer_pos], x);

demux_read_data:

（它检查给出的当前ds缓存里是不是还有剩余，

如果有，它尽可能的读取所要求的长度数据；

如果没有，它就叫ds_fill_buffer()来填充缓存）

- this checks if the given DS's buffer contains something,

if so, it reads from there as much as needed.

If there isn't enough, it calls ds_fill_buffer(),

ds_fill_buffer:

which:
1、- checks if the given DS has buffered packages (DP's),

if so, it moves the oldest to the buffer, and reads on.

（ds_fill_buffer看下这个解析流是否含有缓存的包，如果有，它从最老的包之后开始读数据）

If the list is empty, it calls demux_fill_buffer() :
（如果这个列表是空的，它就再叫demux_fill_buffer来填充）

2、- this calls the parser for the input format, which reads the file onward,
and moves the found packages to their buffers.
(demux_fill_buffer会启动输入格式的分析器,它会向前读取文件，把找到的包放入相应的缓存中)

Well it we'd like an audio package, but only a bunch of video packages are available,

（我们希望有一个音频包，但是只存在一堆视频包）
then sooner or later the:
DEMUXER: Too many (%d in %d bytes) audio packets in the buffer error shows up.

（所以在音频包中的缓存错误是迟早的事情）

2.d. video.c: this file/function handle the reading and assembling of the video frames.
（这个文件或者说函数用来读取和装配视频帧）

     each call to video_read_frame() should read and return a single video frame,
     and it's duration in seconds (float).
     （每当调用video_read_frame时都会读取并返回一帧，它的持续时间用秒为单位）

The implementation is splitted to 2 big parts -（这个实现分为两大部分）

reading from mpeg-like streams and reading from one-frame-per-chunk files (avi, asf, mov).
(读像mpeg这样的流，读像avi,asf,mov这样一块一帧的文件)

     Then it calculates duration, （然后来计算持续时间，两种方法：）
     either from fixed FPS value, （一种是从固定的帧率FPS当中来读取）
     or from the PTS difference between and after reading the frame.（一种是从读取帧之时和之后的PTS差别中获取）

2.e. other utility functions: there are some useful code there, like
AVI muxer, or mp3 header parser, but leave them for now.
（还有好多其他有用的函数，像AVI muxer（AVI复用器），还有MP3头分析器，但是先把它们放在一边吧）

    到目前为止一切都没问题了，这些都可以在libmpdemux库里找到
    So everything is ok 'till now. It can be found in libmpdemux/ library.
    它们必须在mplayer源码树之外编译，你只需要实现一些简单的函数，可以参考libmpdemux/test.c
    It should compile outside of mplayer tree,
    you just have to implement few simple functions,
    like mp_msg() to print messages, etc.
    See libmpdemux/test.c for example.
    (这个文件在1.0rcX系列里没有，在1.0preX系列和0.9X系列里有)

See also formats.txt, for description of common media file formats and

their implementation details in libmpdemux.
请参考formats.txt,那里给出了libmpdemux里关于媒体文件格式的公共描述和它们的实现细节。

Safirst C. Ke: OK，我的工作就从这里开始吧，关于编译这个test.c在另一篇日志里，

because it deserves this! 2010年1月25日11:40:48

Now, go on:

3. mplayer.c - ooh, he's the boss :)
Its main purpose is connecting the other modules, and maintaining A/V sync.
(hoho，这里才是真正的老大，它的作用是连接其他的模块，并且维护音视频同步)

The given stream's actual position is in the 'timer' field of the corresponding stream header (sh_audio / sh_video).
给定的流的实际位置存储在它们对应的流头结构(sh_video/sh_audio)的timer成员里

*************
* important *
*************

播放的循环结构如下：
The structure of the playing loop :
         while(not EOF)
{
             fill audio buffer (read & decode audio) + increase a_frame (读并解码音频，然后加音频帧)
             read & decode a single video frame  + increase v_frame (读并解码视频，然后加视频帧)
             sleep (wait until a_frame>=v_frame)       (睡眠，直到音频帧>=视频帧)
             display the frame          显示该帧
             apply A-V PTS correction to a_frame       (对a_frame进行AV同步修正)
             handle events (keys,lirc etc) -> pause,seek,...                (处理事件，例如按键，暂停等)
         }

(播放的时候，它会增加已播放音视频的时间)
When playing (a/v), it increases the variables by the duration of the played a/v.

对于音频来讲，这就是以播放的字节数除以每秒输出字节数，即播放长度除以播放速度等于播放时间
- with audio this is played bytes / sh_audio->o_bps

Note: i_bps = number of compressed bytes for one second of audio   (未解压数据的输入速度)
        o_bps = number of uncompressed bytes for one second of audio (已解压数据的输出速度)
     (this is = bps*samplerate*channels)
这个等于每秒字节数 * 采样率 * 声道，可以把后面两个参数都视为标量

对于视频来讲，它一般就是fps的倒数，即每秒帧数，
不过我必须注意到fps这个指标并不能很好的描述视频流，
举个例子来说，asf就没有一个这样的参数。
作为替代，我增加了一个duration变量作为持续时间，它是每帧都会改变的。
- with video this is usually == 1.0/fps,
but I have to note that fps doesn't really matters at video,
for example asf doesn't have that,
instead there is "duration" and it can change per frame.

MPEG2有"重复计数"，他会延迟1-2.5帧左右，也许只有AVI和MPEG1才有固定的fps
MPEG2 has "repeat_count" which delays the frame by 1-2.5 ...
Maybe only AVI and MPEG1 has fixed fps.

这样，一切都没问题，只要音视频是同步的。
So everything works right until the audio and video are in perfect synchronity,
since the audio goes, it gives the timing,
and if the time of a frame passed, the next frame is displayed.

但是如果音视频在输入文件里就没有同步好呢？PTS(Presentation TimeStamp)修正就派上用场了。
输入流解析器会读取数据包的PTS，这样我们就知道它们是不是同步的。
But what if these two aren't synchronized in the input file? PTS correction kicks in.
The input demuxers read the PTS (presentation timestamp) of the packages,
and with it we can see if the streams are synchronized.

(Mplayer可以在一个最高界限范围内修正a_frame, 可以参考-mc选项，修正的总次数存在c_total变量里)
Then MPlayer can correct the a_frame, within a given maximal bounder (see -mc option).
The summary of the corrections can be found in c_total .

(当然，这不可能就是事情的全部了，还有一些棘手的事情，例如MPlayer还要处理声卡延迟)
Of course this is not everything, several things suck.
For example the soundcards delay, which has to be corrected by MPlayer!

音频延迟由以下4项之和构成：
读取音频的时间、
放入音频输入缓冲区的时间，
放入音频输出缓冲区的时间、
播放那些已在缓冲区但是尚未播放的音频数据的时间

The audio delay is the sum of all these:

     1、自从最后一次时间戳以来已读字节数所花费的时间，
- bytes read since the last timestamp:
    t1 = d_audio->pts_bytes/sh_audio->i_bps
    t1 = 即音频字节数/未解压每秒字节数

     2、如果是Win32或者ACM，那把字节数存储到音频输入缓冲区也需要时间，
- if Win32/ACM then the bytes stored in audio input buffer
    t2 = a_in_buffer_len/sh_audio->i_bps
    t2 = 缓冲区已存的字节数 / 未解压每秒字节数

     3、已解压的字节数放入音频输出缓冲区所花费的时间
- uncompressed bytes in audio out buffer
    t3 = a_buffer_len/sh_audio->o_bps
    t3 = 缓冲区已存的字节数 / 已解压每秒字节数

     4、播放那些存储在声卡或者DMA缓冲区中但是尚未播放的字节数所需要的时间
- not yet played bytes stored in the soundcard's (or DMA's) buffer
    t4 = get_audio_delay()/sh_audio->o_bps
    t4 = 播放之前那些还没播放的数据 / 每秒输出解压字节数

    有了这些我们就可以计算刚播放的音频到底需要多少PTS，
    等下我们只要把它跟视频的PTS比较一下就知道差别所在了
From this we can calculate what PTS we need for the just played
audio, then after we compare this with the video's PTS, we have
the difference!

对于AVI来说，事情并没有变得更简单一些，那里有一个正式的处理时间的方法——BPS-based
所以它的头部包含了每秒帧数到底对应了多少压缩的音频字节数或者块数

Life didn't get simpler with AVI.
There's the "official" timing method, the BPS-based,
so the header contains how many compressed audio bytes
or chunks belong to one second of frames.

在AVI流的头结构里，有两个重要的域，采样大小、比率
In the AVI stream header there are 2 important fields,
the dwSampleSize, and dwRate/dwScale pairs:

如果dwSampleSize=0,那么这是变化的流，它的比特率不恒定。
这意味着一块数据存储一个样本，而dwRate/dwScale=每秒块数
- If the dwSampleSize is 0, then it's VBR stream,
so its bitrate isn't constant.
It means that 1 chunk stores 1 sample, and
dwRate/dwScale gives the chunks/sec value.

如果dwSampleSize>0,那么这是恒定的比特流，
- If the dwSampleSize is >0, then it's constant bitrate,
and the time can be measured this way:
它的时间这样来计算,所以样本数被除以采样率：
time = (bytepos/dwSampleSize) / (dwRate/dwScale)

(so the sample's number is divided with the samplerate).

现在音频可以被视为流来处理了，可以被分解为很多块，也可以是一块
Now the audio can be handled as a stream, which can
be cut to chunks, but can be one chunk also.

其他的方法仅适用于交错的文件
The other method can be used only for interleaved files:
from the order of the chunks, a timestamp (PTS) value can be calculated.

视频块的PTS=块数 * fps
The PTS of the video chunks are simple: chunk number * fps

音频则跟上一个音频块一致
The audio is the same as the previous video chunk was.

我们必须注意到一个被称为 audio preload的东东，
它意味着音视频流里有延时，一般是0.5-1.0s，但是可以完全不一样
We have to pay attention to the so called "audio preload", that is,
there is a delay between the audio and video streams.
This is usually 0.5-1.0 sec, but can be totally different.

它精确的值到目前为止是测量过了，但是现在demux_avi.c来处理它：
在第一个视频帧之后的第一个音频块时，它计算A/V差异，就将它作为audio preload的一种测度。
The exact value was measured until now, but now the demux_avi.c handles it:
at the audio chunk after the first video, it calculates the A/V difference,
and take this as a measure for audio preload.

**********
*音频播放*
**********
3.a. audio playback:
     关于音频播放也讲两句把，播放本身是不难，难的是：
     1、知道啥时候写入缓冲区，并且要求无阻塞
     2、知道我们写进去的有多少已经被播放了
Some words on audio playback:
Not the playing is hard, but:
1. knowing when to write into the buffer, without blocking
2. knowing how much was played of what we wrote into

第一条是音频解码所要求的，为了保证缓冲区都是满的，所以音频一帧都不会错过
The first is needed for audio decoding, and to keep the buffer
full (so the audio will never skip).

第二条是准确的时间要求，因为有的声卡甚至有3-7秒的延时，这绝对不能忘记。
And the second is needed for correct timing,
because some soundcards delay even 3-7 seconds,
which can't be forgotten about.

要解决这个问题，OSS提供了一些可能的方法：
To solve this, the OSS gives several possibilities:

1、虽然不是所有驱动都支持，ioctl(SNDCTL_DSP_GETODELAY)可以知道声卡缓冲区中
还有多少数据尚未播放，这对于时间的精确控制来讲是很好的。
- ioctl(SNDCTL_DSP_GETODELAY): tells how many unplayed bytes are in
the soundcard's buffer -> perfect for timing, but not all drivers
support it :(

2、ioctl(SNDCTL_DSP_GETOSPACE)可以知道我们还能往声卡缓冲区中无阻塞的写多少数据
如果驱动不支持GETODELAY,我们就可以用这个知道到底有多少延时
- ioctl(SNDCTL_DSP_GETOSPACE): tells how much can we write into the
soundcard's buffer, without blocking. If the driver doesn't
support GETODELAY, we can use this to know how much the delay is.

3、select()可以告诉我们是否可以无阻塞的写缓冲区，但不幸的是，它没法告诉我们到底能写多少，
并且它在一些驱动上无法很好地工作，所以，应该在上面两种方法都不好使的时候才考虑它
- select(): should tell if we can write into the buffer without
    blocking. Unfortunately it doesn't say how much we could :((
    Also, doesn't/badly works with some drivers.
    Only used if none of the above works.

**********
*编码解码*
**********
4. Codecs. Consists of libmpcodecs/* and separate files or libs,
   for example liba52, libmpeg2, xa/*, alaw.c, opendivx/*, loader, mp3lib.

   mplayer.c doesn't call them directly, but through the dec_audio.c and
   dec_video.c files, so the mplayer.c doesn't have to know anything about
   the codecs.

   编码解码由这么一些东西组成，MPlayer.c并不直接调用它们，而是通过
   dec_audio.c和dec_video.c，所以mplayer.c无须知道解码库的任何事情。

   libmpcodecs为每一个解码器有一个封装，它们的调用方式不同：
   1、有的是包含了解码的实现
   2、有的是调用mplayer里的其他文件
   3、有的则调用外部解码库
   libmpcodecs contains wrapper for every codecs,
   some of them include the codec function implementation,
   some calls functions from other files included with mplayer,
   some calls optional external libraries.

   libmpcodecs里的文件命名有以下约定：
   file naming convention in libmpcodecs:
   ad_*.c - audio decoder (called through dec_audio.c) 音频解码器
   vd_*.c - video decoder (called through dec_video.c) 视频解码器
   ve_*.c - video encoder (used by mencoder)            视频编码器
   vf_*.c - video filter (see option -vf)              视频过滤器

   关于这个话题，可以参考
   On this topic, see also:
   libmpcodecs.txt - The structure of the codec-filter path, with explanation
                     编码过滤路径结构
   dr-methods.txt - Direct rendering, MPI buffer management for video codecs
                     直接渲染
   codecs.conf.txt - How to write/edit codec configuration file (codecs.conf)
                     编辑解码编码配置文件
   codec-devel.txt - Mike's hints about codec development - a bit OUTDATED
       如何开发编码解码库
   hwac3.txt - about SP/DIF audio passthrough
                     SP/DIF 音频传递

5. libvo: this displays the frame.
for details on this, read libvo.txt
显示视频帧，请参考libvo.txt

6. libao2: this control audio playing
6.a audio plugins

for details on this, read libao2.txt
控制音频播放，请参考libao2.txt