【FFMPEG】FFplay音视频同步分析（上）

gomogomono

于 2024-09-07 17:31:21 发布

阅读量1k

点赞数 26

分类专栏： # 多媒体框架文章标签： ffmpeg 音视频

本文链接：https://blog.csdn.net/cc289123557/article/details/141999301

版权

多媒体框架专栏收录该内容

16 篇文章 5 订阅

订阅专栏

main入口函数分析

ffplay.c 里面main() 入口函数的流程图如下：
在这里插入图片描述

代码如下：

/* Called from the main */
int main(int argc, char **argv)
{
    int flags;
    VideoState *is;

    init_dynload();--------------------------init_dynload

    av_log_set_flags(AV_LOG_SKIP_REPEATED);
    parse_loglevel(argc, argv, options);

    /* register all codecs, demux and protocols */
#if CONFIG_AVDEVICE
    avdevice_register_all();
#endif
    avformat_network_init();

    signal(SIGINT , sigterm_handler); /* Interrupt (ANSI).    */
    signal(SIGTERM, sigterm_handler); /* Termination (ANSI).  */

    show_banner(argc, argv, options);------------------show_banner

    parse_options(NULL, argc, argv, options, opt_input_file);----------------命令行参数解析

    if (!input_filename) {......}
    
    if (display_disable) {......}-------------------------start,
    flags = SDL_INIT_VIDEO | SDL_INIT_AUDIO | SDL_INIT_TIMER;
    if (audio_disable)
        flags &= ~SDL_INIT_AUDIO;
    else {......}
    
    if (display_disable)
        flags &= ~SDL_INIT_VIDEO;-------------------------end,flags标记不显示SDL窗口之类的，可以设置只播放声音不播放画面
    if (SDL_Init (flags)) {
        av_log(NULL, AV_LOG_FATAL, "Could not initialize SDL - %s\n", SDL_GetError());
        av_log(NULL, AV_LOG_FATAL, "(Did you set the DISPLAY variable?)\n");
        exit(1);
    }
    
    SDL_EventState(SDL_SYSWMEVENT, SDL_IGNORE);
    SDL_EventState(SDL_USEREVENT, SDL_IGNORE);

    if (!display_disable) {......}
    
    is = stream_open(input_filename, file_iformat);-------------stream_open
    if (!is) {
        av_log(NULL, AV_LOG_FATAL, "Failed to initialize VideoState!\n");
        do_exit(NULL);
    }

    event_loop(is);------------------event_loop

    /* never returns */

    return 0;
}

上面的流程分两部分讲，非重点函数跟重点函数。

非重点函数如下：

**1，**init_dynload()，设置动态库加载规则，这是一个安全函数，在 Windows 系统，默认会从当前目录加载 DLL，这容易被攻击。这个函数就是把当前目录的路径从加载规则里面去掉，里面调的是 SetDllDirectory(“”)。

**2，**show_banner()，打印 ffplay 这个软件的版权，版本之类的信息。可以删掉他，让控制台更简洁。

重点函数如下：

1，parse_options()，解析命令行参数，虽然这是一个重点函数，但是为了力求简单，我会一笔带过。本文只用到一个 -i 参数，所以这个函数在这里的作用就是设置 input_filename 全局变量。在《parse_options函数分析》会详细讲解命令行解析。

2，SDL_CreateWindow()，创建 SDL 窗口，具体请看 SDL官方文档。

3，stream_open()，这个函数是重中之重，上图中可以看到，可能会有 4 个线程从 stream_open() 里面诞生。先来讲一下这 4 个线程的作用。

read_thread() ：从网络或者硬盘里面读取 AVPacket，读取到之后放进去 PacketQueue 队列。

audio_thread() ：从 PacketQueue audioq 队列拿 AVPacket，然后丢给解码器解码，解码出来 AVFrame 之后，再把 AVFrame 丢到 FrameQueue 队列。

video_thread() ：从 PacketQueue videoq 队列拿 AVPacket，然后丢给解码器解码，解码出来 AVFrame 之后，再把 AVFrame 丢到 FrameQueue 队列。

subtitle_thread() ：字幕线程，由于 ffplay 的字幕播放有点不完善，不必关注。

上面的 4 个线程不一定会创建，如果 mp4 文件里面没有音频流，就不会创建 audio_thread() 线程，其他的线程类推。

这 4 个线程之间的关系如下：
在这里插入图片描述

read_thread 是生产者，而 audio_thread 跟 video_thread 是消费者。

最后一个重点函数是 event_loop()，这个函数是一个死循环，主要的任务就是不断 处理键盘按键事件 跟 播放视频帧。

stream_open函数分析

在讲 stream_open() 函数之前，需要先了解 stream_open() 里面使用到的一些基本的数据结构。如下：

第一个数据结构是 struct VideoState ，VideoState 可以说是播放器的全局管理器。字段非常多，时钟，队列，解码器，各种状态都放在 VideoState 里面。

但是本文不会把 VideoState 的所有字段都讲一遍，只会讲 stream_open() 函数用到的字段，如下是精简过的字段，顺序也经过调整，方便阅读。

typedef struct VideoState {
    int last_video_stream, last_audio_stream, last_subtitle_stream;
    char* filename;
    AVInputFormat* iformat;
    int width, height, xleft, ytop;
    FrameQueue pictq;
    FrameQueue sampq;
    PacketQueue videoq;
    PacketQueue audioq;
    SDL_cond *continue_read_thread;
    SDL_Thread *read_tid;
    Clock audclk;
    Clock vidclk;
    Clock extclk;    
    int audio_clock_serial;
    int audio_volume;
    int muted;
    int av_sync_type;
} VideoState;

1，int last_video_stream 代表最后一个视频流，如果你的音视频文件里面有多个视频流，last_video_stream 就代表最后一个视频流。另外两个 last_audio_stream, last_subtitle_stream 一样代表最后一个。

2，char* filename 存储的是打开的音视频文件名，或者是网络地址url。

3，AVInputFormat* iformat，容器格式，ffplay 默认是根据 filename 的后缀来确定容器格式，但是你也可以指定按某种容器格式来解析文件。命令如下：

ffplay -i juren-5s.mp4 -f flv
通过命令行参数指定的 -f flv 就会被存储到 AVInputFormat* iformat，当然不是存的字符，有一个根据字符串找到 AVInputFormat 的过程。

4，int width, height, xleft, ytop;，分别代表播放器窗口的宽高跟位置。位置通过 xleft 跟 ytop 来定位的。

5，FrameQueue pictq， FrameQueue sampq，视频跟音频的 AVFrame 队列。

6，PacketQueue videoq，PacketQueue audioq，视频跟音频的 AVPacket 队列。

7，SDL_cond *continue_read_thread，这是一个 SDL 的条件变量，用于线程间通信的。read_thread() 线程在以下两种情况会进入休眠 10ms。

第一种情况：PacketQueue 队列满了，无法再塞数据进去。

第二种情况：超过最小缓存size。

如果在 10ms 内，PacketQueue 队列全部被消耗完毕，audio_thread() 或者 video_thread() 线程没有 AVPakcet 能读了，就需要尽快唤醒 read_thread() 线程。

还有，如果进行了 seek 操作，也需要快速把 read_thread() 线程从休眠中唤醒。

所以 SDL_cond *continue_read_thread 条件变量，主要用于 read_thread 跟 audio_thread ，video_thread 线程进行通信的。

8，SDL_Thread *read_tid;，read_thread 的线程ID。

C++14 标准库有跨平台的线程库，但是 C语言是没有跨平台的线程库，所以 ffplay 取巧了，使用了 SDL 库的线程跟条件变量，SDL 是跨平台的。

9，Clock audclk;，音频时钟，记录音频流的目前的播放时刻。

10，Clock vidclk;，视频时钟，记录视频流的目前的播放时刻。

11，Clock extclk;，外部时钟，取第一帧音频或视频的 pts 作为起始时间，然后随着物理时间的消逝增长，所以是物理时间的当前时刻。到底是以音频的第一帧，还是视频的第一帧？取决于 av_read_frame() 函数第一次读到的是音频还是视频。

12，int audio_clock_serial;，这个字段比较独特，只有音频有，视频没有，没有一个 video_clock_serial 字段。

audio_clock_serial 只是一个用做临时用途的变量，实际上存储的就是 AVFrame 的 serial 字段。不用特别关注。而视频直接用的 AVFrame 的 serial。

13，int audio_volume，播放器的声音大小。

14，int muted，是否静音，C语言C99标准是没有 bool 类型的，都用 int 代替。

15，int av_sync_type，音视频同步方式，有 3 种同步方式，以音频时钟为准，以视频时钟为准，以外部时钟为准。默认方式是以音频时钟为准。

上面的数据结构，有一些字段我会说得比较简洁，因为现在只需你对这些字段有个简单的了解，后面文章会具体详细用到这些字段的场景。

由于 FrameQueue 跟 PacketQueue 这两个数据结构非常重要，所以放一整图片方便理解。
在这里插入图片描述

FrameQueue 里面的 queue 是一个数组，16 在代码里是一个宏，那个宏通常等于 16。PacketQueue 里面的 pkt_list 是一个 AVFifoBuffer，推荐阅读《FifoBuffer函数库详解》。

FrameQueue 跟 PakcetQueue 是通过一个 pktq 指针来关联的。这两个队列都有自己的锁 mutex 跟条件变量 cond。操作这两个队列都需要加锁操作的。

下面开始分析 stream_open() 函数，流程图，代码如下：
在这里插入图片描述

static VideoState *stream_open(const char *filename,
                               const AVInputFormat *iformat)
{
    VideoState *is;

    is = av_mallocz(sizeof(VideoState));--------------------
    if (!is)
        return NULL;
    is->last_video_stream = is->video_stream = -1;
    is->last_audio_stream = is->audio_stream = -1;
    is->last_subtitle_stream = is->subtitle_stream = -1;
    is->filename = av_strdup(filename);
    if (!is->filename)
        goto fail;
    is->iformat = iformat;
    is->ytop    = 0;
    is->xleft   = 0;

    /* start video display */
    if (frame_queue_init(&is->pictq, &is->videoq, VIDEO_PICTURE_QUEUE_SIZE, 1) < 0)-----------
        goto fail;
    if (frame_queue_init(&is->subpq, &is->subtitleq, SUBPICTURE_QUEUE_SIZE, 0) < 0)-----------
        goto fail;
    if (frame_queue_init(&is->sampq, &is->audioq, SAMPLE_QUEUE_SIZE, 1) < 0)------------------
        goto fail;

    if (packet_queue_init(&is->videoq) < 0 ||---------------------
        packet_queue_init(&is->audioq) < 0 ||---------------------
        packet_queue_init(&is->subtitleq) < 0)--------------------
        goto fail;

    if (!(is->continue_read_thread = SDL_CreateCond())) {
        av_log(NULL, AV_LOG_FATAL, "SDL_CreateCond(): %s\n", SDL_GetError());
        goto fail;
    }
    
    init_clock(&is->vidclk, &is->videoq.serial);------------
    init_clock(&is->audclk, &is->audioq.serial);------------
    init_clock(&is->extclk, &is->extclk.serial);------------
    is->audio_clock_serial = -1;
    if (startup_volume < 0)
        av_log(NULL, AV_LOG_WARNING, "-volume=%d < 0, setting to 0\n", startup_volume);
    if (startup_volume > 100)
        av_log(NULL, AV_LOG_WARNING, "-volume=%d > 100, setting to 100\n", startup_volume);
    startup_volume = av_clip(startup_volume, 0, 100);
    startup_volume = av_clip(SDL_MIX_MAXVOLUME * startup_volume / 100, 0, SDL_MIX_MAXVOLUME);
    is->audio_volume = startup_volume;
    is->muted = 0;
    is->av_sync_type = av_sync_type;
    is->read_tid     = SDL_CreateThread(read_thread, "read_thread", is);------------------
    if (!is->read_tid) {
        av_log(NULL, AV_LOG_FATAL, "SDL_CreateThread(): %s\n", SDL_GetError());
fail:
        stream_close(is);
        return NULL;
    }
    return is;
}

从上面的代码可以看出来， stream_open() 函数的内部实现非常的简单。无非就是初始化队列，初始化时钟，然后创建一个 read_thread() 线程去跑。

其中 frame_queue_init() 的内部实现也比较简单，不过有几个重点，如下：

static int frame_queue_init(FrameQueue *f, PacketQueue *pktq, int max_size, int keep_last)
{
    int i;
    memset(f, 0, sizeof(FrameQueue));
    if (!(f->mutex = SDL_CreateMutex())) {
        av_log(NULL, AV_LOG_FATAL, "SDL_CreateMutex(): %s\n", SDL_GetError());
        return AVERROR(ENOMEM);
    }
    if (!(f->cond = SDL_CreateCond())) {
        av_log(NULL, AV_LOG_FATAL, "SDL_CreateCond(): %s\n", SDL_GetError());
        return AVERROR(ENOMEM);
    }
    f->pktq = pktq;
    f->max_size = FFMIN(max_size, FRAME_QUEUE_SIZE);
    f->keep_last = !!keep_last;-------------------
    for (i = 0; i < f->max_size; i++)
        if (!(f->queue[i].frame = av_frame_alloc()))
            return AVERROR(ENOMEM);
    return 0;
}

这个 !! 操作没有什么特别，实际上就是把大于 1 的数字转成 1。如果 keep_last 等于 5，取反两次之后，就会变成 1 了。

packet_queue_init() 函数里面也有一句代码需要注意。

q->abort_request = 1;

abort_request 字段如果置为 1， audio_thread() 跟 video_thread() 解码线程就会退出。所以在创建解码线程之前，ffplay 会把 abort_request 置为 0 ，如下：

static int decoder_start(Decoder *d, int (*fn)(void *), const char *thread_name, void* arg)
{
    packet_queue_start(d->queue);---------------------packet_queue_start
    d->decoder_tid = SDL_CreateThread(fn, thread_name, arg);
    if (!d->decoder_tid) {
        av_log(NULL, AV_LOG_ERROR, "SDL_CreateThread(): %s\n", SDL_GetError());
        return AVERROR(ENOMEM);
    }
    return 0;
}

static void packet_queue_start(PacketQueue *q)
{
    SDL_LockMutex(q->mutex);
    q->abort_request = 0;----------------------------abort_request
    q->serial++;
    SDL_UnlockMutex(q->mutex);
}

至此，stream_open() 函数已经讲解完毕，现在逻辑流程已经流转到新的线程 read_thread() 函数里面了，下面文章继续分析 read_thread() 函数的内部原理。

read_thread解复用线程分析

read_thread() 线程的主要作用从 MP4 里面读取 AVPacket，然后丢进去 PacketQueue 队列。所以需要先学习一下 strcut PacketQueue 跟 struct MyAVPacketList 数据结构。如下：

typedef struct MyAVPacketList {
    AVPacket *pkt;
    int serial;
} MyAVPacketList;

typedef struct PacketQueue {
    AVFifoBuffer *pkt_list; //存储的是 MyAVPacketList
    int nb_packets;
    int size;
    int64_t duration;
    int abort_request;
    int serial;
    SDL_mutex *mutex;
    SDL_cond *cond;
} PacketQueue;

1，AVFifoBuffer *pkt_list ，AVFifoBuffer 是一个 circular buffer FIFO，一个环形的先进先出的缓存实现。里面存储的是 struct MyAVPacketList 结构的数据。

2，int nb_packets;，代表队列里面有多少个 AVPacket。

3，int size; ，队列缓存的数据大小，算法是所有的 AVPacket 本身的大小加上 AVPacket->size 。

4，int64_t duration，队列的时长，通过累加队列里所有的 AVPacket->duration 得到。

5，abort_request，代表队列终止请求，变成 1 会导致 audio_thread 跟 video_thread 退出。

6，int serial，队列的序号，每次跳转播放时间点，serial 就会 +1。另一个数据结构 MyAVPacketList 里面也有一个 serial 字段。

两个 serial 通过比较匹配来丢弃无效的缓存帧，什么情况会导致队列的缓存帧无效？跳转播放时间点的时候。

例如此时此刻，PacketQueue 队列里面缓存了 8 个帧，但是这 8 个帧都第30分钟才开始播放的，如果你通过 ➔ 按键前进到第35分钟的位置播放，那队列的 8 个缓存帧就无效了，需要丢弃。

由于每次跳转播放时间点， PacketQueue::serial 都会 +1 ，而 MyAVPacketList::serial 的值还是原来的，两个 serial 不一样，就会丢弃帧。

7，SDL_mutex *mutex ，SDL 互斥锁，主要用于修改队列的时候加锁。

8，SDL_cond *cond，SDL 条件变量，用于 read_thread() 线程跟 audio_thread() ，video_thread() 线程进行通信的。

在 ffplay -i juren-5s.mp4 的场景下，read_thread 线程的流程图如下：
在这里插入图片描述

read_thread() 线程里面的逻辑相对比较复杂，重点也挺多。首先讲解一下 st_index[] 这个数组变量的含义，如下：

/* this thread gets the stream from the disk or the network */
static int read_thread(void *arg)
{
    VideoState *is = arg;
    AVFormatContext *ic = NULL;
    int err, i, ret;
    int st_index[AVMEDIA_TYPE_NB]; -------- st_index
    AVPacket *pkt = NULL;
    int64_t stream_start_time;
    int pkt_in_play_range = 0;
    const AVDictionaryEntry *t;
    SDL_mutex *wait_mutex = SDL_CreateMutex();
    int scan_all_pmts_set = 0;
    int64_t pkt_ts;

    if (!wait_mutex) {
        av_log(NULL, AV_LOG_FATAL, "SDL_CreateMutex(): %s\n", SDL_GetError());
        ret = AVERROR(ENOMEM);
        goto fail;
    }

    memset(st_index, -1, sizeof(st_index));-------- st_index
    is->eof = 0;
...........

st_index[] 这个数组用的宏是 AVMEDIA_TYPE_NB，也就是这个数组涵盖了各种数据流，音频，视频，字幕，附件流等等。因为一个MP4里面可能会有多个视频流。

例如第 5，第 6 个流都是视频流。这时候 st_index[AVMEDIA_TYPE_VIDEO] 保存的可能就是 5 或者 6 ，代表要播放哪个视频流，其他数据流类推。

默认 st_index[] 数组的值是通过 av_find_best_stream() 确定的，是通过 bit_rate 最大比特率，codec_info_nb_frames 等参数找出最好的那个音频流或者视频流。

第二个重点是 interrupt_callback 这个操作，指定了中断回调函数。

    pkt = av_packet_alloc();
    if (!pkt) {
        av_log(NULL, AV_LOG_FATAL, "Could not allocate packet.\n");
        ret = AVERROR(ENOMEM);
        goto fail;
    }
    ic = avformat_alloc_context();
    if (!ic) {
        av_log(NULL, AV_LOG_FATAL, "Could not allocate context.\n");
        ret = AVERROR(ENOMEM);
        goto fail;
    }
    ic->interrupt_callback.callback = decode_interrupt_cb; ------------------中断回调函数
    ic->interrupt_callback.opaque = is;
    if (!av_dict_get(format_opts, "scan_all_pmts", NULL, AV_DICT_MATCH_CASE)) {
        av_dict_set(&format_opts, "scan_all_pmts", "1", AV_DICT_DONT_OVERWRITE);
        scan_all_pmts_set = 1;
    }

decode_interrupt_cb() 函数实现如下：

static int decode_interrupt_cb(void *ctx)
{
    VideoState *is = ctx;
    return is->abort_request;
}

首先，is->abort_request 这个变量控制着整个播放器要不要停止播放，然后退出。

在播放本地文件的时候，interrupt_callback 回调函数的作用不是特别明显，因为本地读取MP4， av_read_frame() 会非常快返回。

但是如果在播放网络流的时候，网络卡顿，av_read_frame() 可能要 8 秒才能返回，这时候如果想关闭播放器，就需要 av_read_frame() 尽快地返回，不要再阻塞了。这时候，就需要 interrupt_callback 了，因为在 8 秒内，av_read_frame() 内部也会定时执行 interrupt_callback()，只要 interrupt_callback() 返回 1，av_read_frame() 就会不再阻塞，立即返回。

提醒：播放网络流的时候，avformat_find_stream_info() 可能会跟 av_read_frame() 一样阻塞很久。

read_thread() 线程的第三个重点是 avformat_open_input() 函数的使用。

    err = avformat_open_input(&ic, is->filename, is->iformat, &format_opts);
    if (err < 0) {
        print_error(is->filename, err);
        ret = -1;
        goto fail;
    }
    if (scan_all_pmts_set)
        av_dict_set(&format_opts, "scan_all_pmts", NULL, AV_DICT_MATCH_CASE);

    if ((t = av_dict_get(format_opts, "", NULL, AV_DICT_IGNORE_SUFFIX))) {
        av_log(NULL, AV_LOG_ERROR, "Option %s not found.\n", t->key);
        ret = AVERROR_OPTION_NOT_FOUND;
        goto fail;
    }

最后的参数 format_opts 是一个 AVDictionary （字典）。注意，如果 avformat_open_input 函数内部使用了字典的某个选项，就会把这个选项从字典剔除。

所以可以看到，后面判断了还有哪些 option 没使用，这些无法使用的 option （选项），通常是因为命令行参数写错了。

MP4，FLV，TS，等等容器格式，都有一些相同的 option，也有一些不同的 options。具体可以通过以下命令查看容器支持哪些 option ？

ffmpeg -h demuxer=mp4

提示：各种流媒体格式也可以看成是容器。

read_thread() 里面会处理 seek 操作,简单场景下，不会跑进去 seek 条件。

read_thread() 线程的第四个重点是 AVRational sar 变量的应用，如下：

    is->show_mode = show_mode;
    if (st_index[AVMEDIA_TYPE_VIDEO] >= 0) {
        AVStream *st = ic->streams[st_index[AVMEDIA_TYPE_VIDEO]];
        AVCodecParameters *codecpar = st->codecpar;
        AVRational sar = av_guess_sample_aspect_ratio(ic, st, NULL);
        if (codecpar->width)
            set_default_window_size(codecpar->width, codecpar->height, sar);--------------第四个重点
    }

sar 这个值是不太容易理解的，我刚开始也被这个 sar 搞懵。我之前以为 sar 等于 width/height （宽高比），后来发现不是宽高比。

其实 sar 是以前的显示设备设计的历史遗留问题，不用过多关注，只需要知道，显示的时候用 sar 这个比例拉伸 width 跟 height 作为显示窗口，图像播放就不会扭曲了。sar 在大部分情况都是 1:1。

接下来来到 read_thread() 线程里最重要的重点，stream_component_open() 函数的调用，audio_thread()，video_thread() 等解码线程就是从 stream_component_open() 里创建出来的

上面所有代码干的活，主要是找出最好的音视频流，设置回调，各种初始化，打开容器实例。

现在到了 read_thread() 线程的主要任务，那就是进入 for (;😉 {…} 死循环不断从容器实例读取 AVPacket ，然后丢进去对应的 PacketQueue 队列

for 循环里面也有一些重点，如下：

    for (;;) {
        if (is->abort_request)
            break;
        if (is->paused != is->last_paused) {
            is->last_paused = is->paused;
            if (is->paused)
                is->read_pause_return = av_read_pause(ic);
            else
                av_read_play(ic);
        }
#if CONFIG_RTSP_DEMUXER || CONFIG_MMSH_PROTOCOL

对于播放本地文件，av_read_pause() 函数其实是没有作用的。av_read_pause() 只对网络流播放有效，有些流媒体协议支持暂停操作，暂停了，服务器就不会再往 ffplay 推送数据，如果想重新推数据，需要调用 av_read_play()

for 循环里面的第二个重点是判断队列缓存中的 AVPacket 是否够用，够用就会休眠 10ms。如下：

        /* if the queue are full, no need to read more */
        if (infinite_buffer<1 &&
              (is->audioq.size + is->videoq.size + is->subtitleq.size > MAX_QUEUE_SIZE
            || (stream_has_enough_packets(is->audio_st, is->audio_stream, &is->audioq) &&
                stream_has_enough_packets(is->video_st, is->video_stream, &is->videoq) &&
                stream_has_enough_packets(is->subtitle_st, is->subtitle_stream, &is->subtitleq)))) {
            /* wait 10 ms */
            SDL_LockMutex(wait_mutex);
            SDL_CondWaitTimeout(is->continue_read_thread, wait_mutex, 10);-----------------休眠 10ms
            SDL_UnlockMutex(wait_mutex);
            continue;
        }

在播放本地文件的时候，infinite_buffer 总是 0，所以不用管它。

可以看到，判断 AVPacket 是否够用，就是根据 size 来判断，还有 stream_has_enough_packets() 函数，实现如下：

static int stream_has_enough_packets(AVStream *st, int stream_id, PacketQueue *queue) {
    return stream_id < 0 ||
           queue->abort_request ||
           (st->disposition & AV_DISPOSITION_ATTACHED_PIC) ||
           queue->nb_packets > MIN_FRAMES && (!queue->duration || av_q2d(st->time_base) * queue->duration > 1.0);
}

stream_has_enough_packets() 主要就是确认队列至少有 MIN_FRAMES 个帧，而且所有帧的播放时长加起来大于 1 秒钟。

当队列缓存中的 AVPacket 未满的时候，就会直接去读磁盘数据，把 AVPacket 读出来，但是也不是读出来就会立即丢进去 PacketQueue 队列，而是会判断一下AVPacket 是否在期待的播放时间范围内。如下：

        ret = av_read_frame(ic, pkt);
        if (ret < 0) {
            if ((ret == AVERROR_EOF || avio_feof(ic->pb)) && !is->eof) {
                if (is->video_stream >= 0)
                    packet_queue_put_nullpacket(&is->videoq, pkt, is->video_stream);
                if (is->audio_stream >= 0)
                    packet_queue_put_nullpacket(&is->audioq, pkt, is->audio_stream);
                if (is->subtitle_stream >= 0)
                    packet_queue_put_nullpacket(&is->subtitleq, pkt, is->subtitle_stream);
                is->eof = 1;
            }
            if (ic->pb && ic->pb->error) {
                if (autoexit)
                    goto fail;
                else
                    break;
            }
            SDL_LockMutex(wait_mutex);
            SDL_CondWaitTimeout(is->continue_read_thread, wait_mutex, 10);
            SDL_UnlockMutex(wait_mutex);
            continue;
        } else {
            is->eof = 0;
        }
        /* check if packet is in play range specified by user, then queue, otherwise discard */
        stream_start_time = ic->streams[pkt->stream_index]->start_time;
        pkt_ts = pkt->pts == AV_NOPTS_VALUE ? pkt->dts : pkt->pts;
        pkt_in_play_range = duration == AV_NOPTS_VALUE ||-------------------是否在期待的播放时间范围内
                (pkt_ts - (stream_start_time != AV_NOPTS_VALUE ? stream_start_time : 0)) *
                av_q2d(ic->streams[pkt->stream_index]->time_base) -
                (double)(start_time != AV_NOPTS_VALUE ? start_time : 0) / 1000000
                <= ((double)duration / 1000000);
        if (pkt->stream_index == is->audio_stream && pkt_in_play_range) {
            packet_queue_put(&is->audioq, pkt);
        } else if (pkt->stream_index == is->video_stream && pkt_in_play_range
                   && !(is->video_st->disposition & AV_DISPOSITION_ATTACHED_PIC)) {
            packet_queue_put(&is->videoq, pkt);
        } else if (pkt->stream_index == is->subtitle_stream && pkt_in_play_range) {
            packet_queue_put(&is->subtitleq, pkt);
        } else {
            av_packet_unref(pkt);
        }

可以看到定义了一个变量 pkt_in_play_range 来确定是否在播放时间范围内。播放时间范围这个概念是这样的。如果下面这样播放一个视频：

ffplay -i juren-5s.mp4

因为 juren-5s.mp4 是一个 5 秒的视频，而且命令行没有指定 -t，所以这时候播放时间范围就是 0 ~ 5 秒。只要读出来的 AVPacket 的 pts 在 0 ~ 5秒范围内，pkt_in_play_range 变量就为真。因此所有读出来的 AVPacket 都是符合播放时间范围的。

但是如果加了 -t 参数，如下：

ffplay -t 2 -i juren-5s.mp4

上面的的命令是只播放 2秒视频，也就是播放时间范围变成了 0 ~ 2 秒，如果读出来的 AVPacket 的 pts 大于 2 秒，就会被丢弃。

这里就有一个有趣的事情，当视频播放到第二秒的时候，虽然画面停止了，但是 read_thread() 还是会一直读数据，但由于不符合播放时间范围，会一直丢弃。直到读到文件结尾，返回 AVERROR_EOF 才会停下来休眠一小段时间。

读出来的 AVPacket 符合播放时间之后，就会用 packet_queue_put() 丢进去 PacketQueue 队列。

可以看到，音频，视频流，是有各自的 PacketQueue 队列的，is->audioq 跟 is->videoq。

FFplay 播放器的逻辑流转，目前就转到 for (;;) {...} 循环里面不断读取 AVPacket 数据。

read_thread() 线程函数最后的 fail: 标签代码，是播放器退出之后的清理逻辑，这个目前不需要理会。

stream_component_open函数分析

stream_component_open() 函数主要作用是打开音频流或者视频流对应的解码器，开启解码线程去解码。

流程图如下：
在这里插入图片描述

stream_component_open() 的函数定义如下：

/* open a given stream. Return 0 if OK */
static int stream_component_open(VideoState *is, int stream_index)

可以看到，函数的参数非常简单，第一个参数是 VideoState *is 全局管理器，第二个参数 stream_index 是数据流的索引值。

下面来分析 stream_component_open() 的函数里面的重点代码：

/* open a given stream. Return 0 if OK */
static int stream_component_open(VideoState *is, int stream_index)
{
    AVFormatContext *ic = is->ic;
    AVCodecContext *avctx;
    const AVCodec *codec;
    const char *forced_codec_name = NULL;
    AVDictionary *opts = NULL;
    const AVDictionaryEntry *t = NULL;
    int sample_rate;
    AVChannelLayout ch_layout = { 0 };
    int ret = 0;
    int stream_lowres = lowres;

    if (stream_index < 0 || stream_index >= ic->nb_streams)
        return -1;

    avctx = avcodec_alloc_context3(NULL);-----------------重点代码
    if (!avctx)
        return AVERROR(ENOMEM);

    ret = avcodec_parameters_to_context(avctx, ic->streams[stream_index]->codecpar);-----------------重点代码
    if (ret < 0)
        goto fail;
    avctx->pkt_timebase = ic->streams[stream_index]->time_base;

    codec = avcodec_find_decoder(avctx->codec_id);-----------------重点代码

一开始的 avcodec_alloc_context3() 跟 avcodec_parameters_to_context() ，这可以说是常规操作了，就是申请一个解码器实例的内存，然后把容器流里面的信息拷贝过去。容器里面通常都是有编码器信息的。

第二个重点是，使用指定的编码器，例如你不用想 libx264 编码器，而是使用 openh264 编码器，就可以用 -c:v openh264 参数指定编码器。如下：

ffplay -c:v openh264 juren.mp4

也有另一种情况，就是容器里面记录的编码器信息是错误的，而你又知道正确的编码器信息，就可以强制指定。命令行的参数会赋值给 forced_codec_name 变量。

    if (ret < 0)
        goto fail;
    avctx->pkt_timebase = ic->streams[stream_index]->time_base;

    codec = avcodec_find_decoder(avctx->codec_id);-----------------关键代码

    switch(avctx->codec_type){
        case AVMEDIA_TYPE_AUDIO   : is->last_audio_stream    = stream_index; forced_codec_name =    audio_codec_name; break;-----------------关键代码
        case AVMEDIA_TYPE_SUBTITLE: is->last_subtitle_stream = stream_index; forced_codec_name = subtitle_codec_name; break;-----------------关键代码
        case AVMEDIA_TYPE_VIDEO   : is->last_video_stream    = stream_index; forced_codec_name =    video_codec_name; break;-----------------关键代码
    }
    if (forced_codec_name)
        codec = avcodec_find_decoder_by_name(forced_codec_name);-----------------关键代码
    if (!codec) {
        if (forced_codec_name) av_log(NULL, AV_LOG_WARNING,
                                      "No codec could be found with name '%s'\n", forced_codec_name);
        else                   av_log(NULL, AV_LOG_WARNING,
                                      "No decoder could be found for codec %s\n", avcodec_get_name(avctx->codec_id));
        ret = AVERROR(EINVAL);
        goto fail;
    }

第三个重点，只有两个函数，filter_codec_opts() 跟 avcodec_open2() 。

    if (fast)
        avctx->flags2 |= AV_CODEC_FLAG2_FAST;

    opts = filter_codec_opts(codec_opts, avctx->codec_id, ic, ic->streams[stream_index], codec);-----------filter_codec_opts()
    if (!av_dict_get(opts, "threads", NULL, 0))
        av_dict_set(&opts, "threads", "auto", 0);
    if (stream_lowres)
        av_dict_set_int(&opts, "lowres", stream_lowres, 0);
    if ((ret = avcodec_open2(avctx, codec, &opts)) < 0) {-----------------avcodec_open2()
        goto fail;
    }
    if ((t = av_dict_get(opts, "", NULL, AV_DICT_IGNORE_SUFFIX))) {
        av_log(NULL, AV_LOG_ERROR, "Option %s not found.\n", t->key);
        ret =  AVERROR_OPTION_NOT_FOUND;
        goto fail;
    }

filter_codec_opts() 这个函数实际上就是把命令行参数的相关参数提取出来。举个例子：

ffpaly -b:v 2000k -i juren-5s.mp4

上面的命令，指定了解码器的码率，但是他指定的是视频的码率，当 stream_component_open() 打开视频流的时候，这个码率参数才会被 filter_codec_opts() 提取出来。

而stream_component_open() 打开音频流的时候，b:v 不会被提取出来，因为这个参数是跟视频流相关的。

所以你可以把 filter_codec_opts() 看成是一个处理命令行参数的函数，提取相关的参数。至于什么是相关，可以自行看这个函数的内部实现。
然后 avcodec_open2() 就会接受 filter_codec_opts() 返回的 AVDictionary 参数。

至此，解码器参数已经设置完毕，解码器也已经打开了了。

第四个重点是把流属性设置为不丢弃，就是下面这一句代码。

ic->streams[stream_index]->discard = AVDISCARD_DEFAULT;

可以看到，stream_component_open() 函数会把打开的流的 discard 设置为 AVDISCARD_DEFAULT，这样这个流的数据就可以从 av_read_frame() 函数里面读出来了。

注意，ffplay.c 之前在 read_thread() 函数里面，是把所有的流都设置为了 AVDISCARD_ALL，也就是会丢弃所有流的数据包。

    for (i = 0; i < ic->nb_streams; i++) {
        AVStream *st = ic->streams[i];
        enum AVMediaType type = st->codecpar->codec_type;
        st->discard = AVDISCARD_ALL;---------------------- read_thread() 函数
        if (type >= 0 && wanted_stream_spec[type] && st_index[type] == -1)
            if (avformat_match_stream_specifier(ic, st, wanted_stream_spec[type]) > 0)
                st_index[type] = i;
    }

所以，如果 mp4 里面有多个视频流，av_read_frame() 只会读取最好的那个视频流的包，音频流同理。

最后一个重点，就是一个 switch case 的逻辑，如下：

    is->eof = 0;
    ic->streams[stream_index]->discard = AVDISCARD_DEFAULT;
    switch (avctx->codec_type) {---------------- switch case 的逻辑
    case AVMEDIA_TYPE_AUDIO:
#if CONFIG_AVFILTER
        {
            AVFilterContext *sink;

这段代码非常多，这里分别对音频，视频，字幕做了区别处理。但是可以看到，音频的逻辑代码明显是最多的。

下面开始分析重点，如下：

#if CONFIG_AVFILTER
        {
            AVFilterContext *sink;

            is->audio_filter_src.freq           = avctx->sample_rate;----------------audio_filter_src
            ret = av_channel_layout_copy(&is->audio_filter_src.ch_layout, &avctx->ch_layout);
            if (ret < 0)
                goto fail;
            is->audio_filter_src.fmt            = avctx->sample_fmt;
            if ((ret = configure_audio_filters(is, afilters, 0)) < 0)----------------configure_audio_filters
                goto fail;
            sink = is->out_audio_filter;
            sample_rate    = av_buffersink_get_sample_rate(sink);
            ret = av_buffersink_get_ch_layout(sink, &ch_layout);
            if (ret < 0)
                goto fail;
        }
#else

首先可以看到，他有一个宏判断，大部分情况 AVFILTER 滤镜模块都是启用，所以不用管第二个 else。这里需要注意一下，虽然 ffplay -i juren-5s.mp4 这条命令没有使用滤镜，但是 ffplay 的逻辑还是会创建滤镜实例的，只不过这是一个空的实例。这样做是为了代码逻辑更加通用。

无论命令行参数使不使用滤镜，他都是同样的逻辑。

然后需要注意上图中的 is->audio_filter_src 变量，这个变量存储的实际上是从解码器出来的音频信息。然后调 configure_audio_filters() 这个函数来创建音频流的滤镜。

configure_audio_filters() 函数最重要的地方就是搞好了 is->in_audio_filter 跟 is->out_audio_filter 两个滤镜。解码器输出 AVFrame 之后需要往 in_audio_filter 里面丢，然后播放的时候，需要从 out_audio_filter 读取 AVFrame。

后面的av_buffersink_get_sample_rate() 等函数的调用实际上就是从 buffsink 出口滤镜里面获取到最后的音频信息。

第二个重点如下：

        /* prepare audio output */
        if ((ret = audio_open(is, &ch_layout, sample_rate, &is->audio_tgt)) < 0)-----------------重点
            goto fail;
        is->audio_hw_buf_size = ret;
        is->audio_src = is->audio_tgt;-----------------------重点
        is->audio_buf_size  = 0;
        is->audio_buf_index = 0;

        /* init averaging filter */---------------------------------------start,用在音频向视频同步的场景上的
        is->audio_diff_avg_coef  = exp(log(0.01) / AUDIO_DIFF_AVG_NB);
        is->audio_diff_avg_count = 0;
        /* since we do not have a precise anough audio FIFO fullness,
           we correct audio sync only if larger than this threshold */
        is->audio_diff_threshold = (double)(is->audio_hw_buf_size) / is->audio_tgt.bytes_per_sec;----end,用在音频向视频同步的场景上的

        is->audio_stream = stream_index;-----------------------重点
        is->audio_st = ic->streams[stream_index];--------------重点

audio_open() 函数的内部逻辑就是调 SDL_OpenAudioDevice() 打开音频设备，不过由于音频设备各种各样，从 buffersink 滤镜出来的音频帧，不一定被硬件设备支持，所以可能需要降低采样率之类。例如：有些比较差的音响不支持太高采样或者太多的声道数。

audio_open() 函数会选出被硬件设备支持的采样率，声道数去打开。这些最终的声道数，采样率等信息，就放在 is->audio_tgt 变量返回。

所以 audio_open() 函数的重点是，打开音频设备，并且把最终的音频信息放在 is->audio_tgt 变量里面了。

接着，会把 audio_tgt 拷贝给 audio_src，如下：

is->audio_src = is->audio_tgt;

这句代码看起来会有点莫名其妙，为什么把 audio_tgt 赋值给 audio_src 呢？

首先 is->audio_src 是一个 struct AudioParams ，一个存储音频格式信息的结构体。变量名里有个 src ，代表音频的源头，也就是音频源头的格式是怎样的。但是注意这个源头不是指 MP4 文件里面的音频格式，虽然这个也是源头。

但是它的 src 指的是 is->swr_ctx 重采样实例的源头，也就是当需要进行重采样的时候，要输入给 is->swr_ctx 的原始音频格式就是 is->audio_src。流程图如下：

在这里插入图片描述

上面的流程图看起来比较容易理解，这是需要重采样的流程，但是不一定总是需要重采样的，当 buffersink 出口滤镜出来的音频格式，跟打开硬件设备时候的音频格式（is->audio_tgf）一致的时候，就不需要重采样了。

上面的流程图，如果去掉重采样，是不是就直接是 is->audio_src = is->audio_tgt; 了？

因此 is->audio_src 存储的其实是 buffersink 出口滤镜的音频格式，但是因为出口滤镜的音频格式可能跟 is->audio_tgt 本身是一样的，所以它上面那句代码就这样写了。

buffersink 跟 audio_tgt 音频格式不一样，就需要重采样。从重采样实例 is->swr_ctx 角度来看， is->audio_src 确实是源头。只是他的代码取巧了一下。

先剧透一下后面 audio_decode_frame() 函数中的重采样代码，如下

    if (af->frame->format        != is->audio_src.fmt            ||
        av_channel_layout_compare(&af->frame->ch_layout, &is->audio_src.ch_layout) ||
        af->frame->sample_rate   != is->audio_src.freq           ||
        (wanted_nb_samples       != af->frame->nb_samples && !is->swr_ctx)) {
        swr_free(&is->swr_ctx);
        swr_alloc_set_opts2(&is->swr_ctx,
                            &is->audio_tgt.ch_layout, is->audio_tgt.fmt, is->audio_tgt.freq,
                            &af->frame->ch_layout, af->frame->format, af->frame->sample_rate,
                            0, NULL);
        if (!is->swr_ctx || swr_init(is->swr_ctx) < 0) {
            av_log(NULL, AV_LOG_ERROR,
                   "Cannot create sample rate converter for conversion of %d Hz %s %d channels to %d Hz %s %d channels!\n",
                    af->frame->sample_rate, av_get_sample_fmt_name(af->frame->format), af->frame->ch_layout.nb_channels,
                    is->audio_tgt.freq, av_get_sample_fmt_name(is->audio_tgt.fmt), is->audio_tgt.ch_layout.nb_channels);
            swr_free(&is->swr_ctx);
            return -1;
        }
        if (av_channel_layout_copy(&is->audio_src.ch_layout, &af->frame->ch_layout) < 0)
            return -1;
        is->audio_src.freq = af->frame->sample_rate;------------------重新赋值为源头
        is->audio_src.fmt = af->frame->format;
    }

小总结：ffplay 有两个处理音频的地方，一个是滤镜（is->agraph），一个是重采样（is->swr_ctx）。

最后，就是记录播放的音频流信息，其他的视频流，字幕流也有类似的操作，如下：

is->audio_stream = stream_index;
is->audio_st = ic->streams[stream_index];

最后一个重点就是调用 decoder_init() 与 decoder_start()，如下：

        if ((ret = decoder_init(&is->auddec, avctx, &is->audioq, is->continue_read_thread)) < 0)--------decoder_init
            goto fail;
        if ((is->ic->iformat->flags & (AVFMT_NOBINSEARCH | AVFMT_NOGENSEARCH | AVFMT_NO_BYTE_SEEK)) && !is->ic->iformat->read_seek) {
            is->auddec.start_pts = is->audio_st->start_time;
            is->auddec.start_pts_tb = is->audio_st->time_base;
        }
        if ((ret = decoder_start(&is->auddec, audio_thread, "audio_decoder", is)) < 0)----------------decoder_start
            goto out;
        SDL_PauseAudioDevice(audio_dev, 0);---------------SDL_PauseAudioDevice
        break;

decoder_init() 函数是比较简单的，不过它用了一个新的数据结构 struct Decoder，所以我们先讲一下这个结构，如下：

typedef struct Decoder {
    AVPacket *pkt; //要进行解码的 AVPacket，也是要发送给解码器的 AVPacket
    PacketQueue *queue; // AVPacket 队列
    AVCodecContext *avctx; //解码器实例
    int pkt_serial; //序列号
    int finished; //已完成的时候，finished 等于上面的 pkt_serial。当 buffersink 输出 EOF 的时候就是已完成。
    int packet_pending; //代表上一个 AVPacket 已经从队列取出来了，但是未发送成功给解码器。未发生成功的会保留在第一个字段 pkt 里面，下次会直接发送，不从队列取。
    SDL_cond *empty_queue_cond; //条件变量，AVPacket 队列已经没有数据的时候会激活这个条件变量。
    int64_t start_pts; //流的第一帧的pts
    AVRational start_pts_tb; //流的第一帧的pts的时间基
    int64_t next_pts; //下一帧的pts，只有音频用到这个 next_pts 字段
    AVRational next_pts_tb; //下一帧的pts的时间基
    SDL_Thread *decoder_tid; //解码线程 ID。
} Decoder;

我讲解讲一下 struct Decoder 结构的一些字段，首先是第一个 AVPacket *pkt ，这个实际上就是从 AVPacket 队列拿出来的。然后把这个 pkt 发送给解码器，如果发送成功，那当然是 unref 这个 pkt，但是如果发送给解码器失败，就会把 packet_pending 置为1，pkt 不进行 unref，下次再继续发送。

还有一个需要讲解的是 next_pts 字段，一些读者可能会疑惑，不是每一个 AVFrame 都有 pts 的吗？为什么还需要这个 next_pts 这个字段？

这就是因为解码出来的 AVFrame 的 pts 有些是 AV_NOPTS_VALUE，这时候就需要 next_pts 来纠正。

next_pts 的计算规则就是上一帧的 pts 加上他的样本数（也就是播放多久）。

注意：视频流没有使用 next_pts 来纠正，只有音频流用了 next_pts，如下：

                    case AVMEDIA_TYPE_AUDIO:--------------------AVMEDIA_TYPE_AUDIO
                        ret = avcodec_receive_frame(d->avctx, frame);
                        if (ret >= 0) {
                            AVRational tb = (AVRational){1, frame->sample_rate};
                            if (frame->pts != AV_NOPTS_VALUE)
                                frame->pts = av_rescale_q(frame->pts, d->avctx->pkt_timebase, tb);
                            else if (d->next_pts != AV_NOPTS_VALUE)
                                frame->pts = av_rescale_q(d->next_pts, d->next_pts_tb, tb);----------------next_pts
                            if (frame->pts != AV_NOPTS_VALUE) {
                                d->next_pts = frame->pts + frame->nb_samples;
                                d->next_pts_tb = tb;
                            }
                        }
                        break;

接下来分析decoder_init() 函数，代码如下：

static int decoder_init(Decoder *d, AVCodecContext *avctx, PacketQueue *queue, SDL_cond *empty_queue_cond) {
    memset(d, 0, sizeof(Decoder));
    d->pkt = av_packet_alloc();
    if (!d->pkt)
        return AVERROR(ENOMEM);
    d->avctx = avctx;
    d->queue = queue;
    d->empty_queue_cond = empty_queue_cond;
    d->start_pts = AV_NOPTS_VALUE;
    d->pkt_serial = -1;
    return 0;
}

可以看到，就是做一些赋值，比较简单，但是也有一个重点，就是他的 empty_queue_cond 实际上就是 continue_read_thread，只是换了个名字。

        if ((ret = decoder_init(&is->auddec, avctx, &is->audioq, is->continue_read_thread)) < 0)
            goto fail;
        if ((is->ic->iformat->flags & (AVFMT_NOBINSEARCH | AVFMT_NOGENSEARCH | AVFMT_NO_BYTE_SEEK)) && !is->ic->iformat->read_seek) {
            is->auddec.start_pts = is->audio_st->start_time;
            is->auddec.start_pts_tb = is->audio_st->time_base;
        }

接着分析下一个函数 decoder_start()，代码如下：

static int decoder_start(Decoder *d, int (*fn)(void *), const char *thread_name, void* arg)
{
    packet_queue_start(d->queue);
    d->decoder_tid = SDL_CreateThread(fn, thread_name, arg);
    if (!d->decoder_tid) {
        av_log(NULL, AV_LOG_ERROR, "SDL_CreateThread(): %s\n", SDL_GetError());
        return AVERROR(ENOMEM);
    }
    return 0;
}

比较简单，就是开启 SDL 解码线程。

至此，switch case 里面对于音频的处理就讲解完毕，对于视频的处理更加简单，仅仅调了 decoder_init() 与 decoder_start()，如下：

    case AVMEDIA_TYPE_VIDEO:
        is->video_stream = stream_index;
        is->video_st = ic->streams[stream_index];

        if ((ret = decoder_init(&is->viddec, avctx, &is->videoq, is->continue_read_thread)) < 0)
            goto fail;
        if ((ret = decoder_start(&is->viddec, video_thread, "video_decoder", is)) < 0)
            goto out;
        is->queue_attachments_req = 1;
        break;
    case AVMEDIA_TYPE_SUBTITLE:
        is->subtitle_stream = stream_index;
        is->subtitle_st = ic->streams[stream_index];

        if ((ret = decoder_init(&is->subdec, avctx, &is->subtitleq, is->continue_read_thread)) < 0)
            goto fail;
        if ((ret = decoder_start(&is->subdec, subtitle_thread, "subtitle_decoder", is)) < 0)
            goto out;
        break;

stream_component_open()函数分析完毕。