Tutorial 05: Synching Video

最新推荐文章于 2021-04-22 22:16:21 发布

oldmtn

最新推荐文章于 2021-04-22 22:16:21 发布

阅读量1k

点赞数

分类专栏： ffmpeg's tutorial

本文链接：https://blog.csdn.net/oldmtn/article/details/48137051

版权

ffmpeg's tutorial 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

看了很多的例子。本章需要学习的是视频同步，有很多新知识需要学习。我就先把代码翻译一下。

CAVEAT

第一次写这个入门教程时，所有的同步代码都来至于ffplay.c。然而今天，这是一个完全改版的程序，因为ffmpeg库（包括ffplay.c）都在策略上有所改变。虽然当前的代码可以工作，但是并不好，而且这个教程里的代码还有很大的提升空间。

视频同步的原理（How Video Syncs）

直到现在，我们已经做一个具备基本功能，但其实是没有什么用处的视频播放器。它可以播放视频，可以播放音频，但却不能称之为电影，我们该怎么办呢？

显示时间戳和解码时间戳（PTS and DTS）

幸运的是，音频流和视频有何时播放的信息。音频流有采样频率而视频流有每秒可播放的帧数。然而，如果我们只是简单的使用帧数乘以帧率来同步视频的话，那么很可能就超出音频的播放速度了。然而，从视频流里面解码出来的packet很可能会有DTS和PTS。要理解这2个值的含义，你需要知道视频数如何存储的。有些格式，比如MPEG，使用了B帧（B表示双向预测帧）. I 帧代表一个完整的图像。P帧依赖前面的I帧，P帧可能只是前面I帧的一个变换或者其他什么东西。B帧和P是有一样的地方，但是准确的预测B帧需要考虑到其前后双向的帧。这就是为什么调用avcode_decode_video2没有得到一个完整的帧的原因。

假设有个视频，它的帧呈现的格式为：I B B P. 这样当我们显示B帧的时候，就需要知道P帧的内容。因为这个原因，帧的存储格式很可能为： I P B B. 这就是为什么每个帧都有一个解码时间戳DTS和显示时间戳PTS。解码时间戳告诉我们，什么时候需要解码帧，而显示时间戳告诉我们什么时候需要显示某个帧。因此流很可能是如下的情况：

PTS: 1 4 2 3

DTS: 1 2 3 4

Stream: I P B B

通常PTS与DTS只有当流中含有B帧时才会不一样。当我们从av_read_frame()获取一个包时，该包里面将含有PTS和DTS这2个数据。但是我们真正想要知道的是新解码的raw frame的PTS，这样我们才能知道何时显示它。

万幸的是，ffmpeg提供了一个最有效的时间戳，你可以通过av_frame_get_best_effort_timestamp来获取它。

同步（Synching）

现在，我们已经知道了什么时候显示一个特定的视频帧，但是实际该如何操作呢？这里有个方法：当我们播放一个帧时，计算出下一个帧显示的时间。然后设置一个定时器，当超时后我们发出一个刷新的事件来刷新视频。正如你所想象的，我们检查下一个视频帧的PTS值和系统时钟对比确定定时器的时间。这个方法可以工作，但是有2个问题需要处理。

1. 第一个问题就是下一个PTS该如何计算。你有可能会想我们可以在当前的PTS上面加上帧速率，但其实这是错误的。然而，某些种类的视频要求帧重复。这意味着我们应该重复当前帧特定的次数。这可能导致程序过早的显示下一个视频帧。而我们需要做点解释。

2. 第二个问题正如程序显示的那样，视频和音频会嗡嗡作响，而不是同步好了。如果一切都很好了，我们就没必要担心了。但是你的电脑并不完美，许多食品文件也不是很好。因此我们有3个选择：将音频同步到视频；将视频同步到音频；或者将两者都同步到外部时钟（比如计算机的时钟）。而目前，我们打算将视频同步到音频。

编码：获取帧的PTS（Coding it: getting the frame PTS）

So now we've got our PTS all set.

目前为止，我们已经可以正确设置PTS。

Now we've got to take care of the two synchronization problems we talked about above.

现在我们必须讨论上述的2个同步问题了。

We're going to define a function called synchronize_video that will update the PTS to be in sync with everything.

我们定义一个synchronize_video函数去更新PTS。

This function will also finally deal with cases where we don't get a PTS value for our frame.

这个函数对无法得到PTS的Packet也能正确处理。

At the same time we need to keep track of when the next frame is expected so we can set our refresh rate properly.

同时，我们需要持续的追踪下一帧显示的时间，这样我们可以设置刷新率。

We can accomplish this by using an internal video_clock value which keeps track of how much time has passed according to the video.

可以使用一个内部时钟video_clock去持续追踪视频已经播放的时间。

We add this value to our big struct.

我们把这个结构加到了大的Video_State里面。

typedef struct VideoState {
  double          video_clock; // pts of last decoded frame / predicted pts of next decoded frame

Here's the synchronize_video function, which is pretty self-explanatory:

这里是synchronize_video函数，自己看看，不需要过多说明了。

double synchronize_video(VideoState *is, AVFrame *src_frame, double pts) {

  double frame_delay;

  if(pts != 0) {
    /* if we have pts, set video clock to it */
    is->video_clock = pts;
  } else {
    /* if we aren't given a pts, set it to the clock */
    pts = is->video_clock;
  }
  /* update the video clock */
  frame_delay = av_q2d(is->video_st->codec->time_base);
  /* if we are repeating a frame, adjust clock accordingly */
  frame_delay += src_frame->repeat_pict * (frame_delay * 0.5);
  is->video_clock += frame_delay;
  return pts;
}

You'll notice we account for repeated frames in this function, too.

你会注意到我们

Now let's get our proper PTS and queue up the frame using queue_picture, adding a new pts argument:

现在，让我们获取合适的PTS并且使用queue_picture将其加入到队列中，并添加一个pts字段。

// Did we get a video frame?
    if(frameFinished) {
      pts = synchronize_video(is, pFrame, pts);
      if(queue_picture(is, pFrame, pts) < 0) {
	break;
      }
    }

The only thing that changes about queue_picture is that we save that pts value to the VideoPicture structure that we queue up.

queue_picture唯一改变的地方就是我们把pts值放到了VideoPicture结构中了。

So we have to add a pts variable to the struct and add a line of code:

因此我们给VideoPicture加了一个结构：

typedef struct VideoPicture {
  ...
  double pts;
}
int queue_picture(VideoState *is, AVFrame *pFrame, double pts) {
  ... stuff ...
  if(vp->bmp) {
    ... convert picture ...
    vp->pts = pts;
    ... alert queue ...
  }

So now we've got pictures lining up onto our picture queue with proper PTS values, so let's take a look at our video refreshing function.

目前为止，我们已经将picture使用合适的PTS值排到了队列中，因此让我们看下视频刷新函数。

You may recall from last time that we just faked it and put a refresh of 80ms.

你可以回想下，上一节我们只是用一个代替的刷新时间80ms。

Well, now we're going to find out how to actually figure it out.

很好，现在我们需要确定如何正确的计算下一帧的刷新时间。

Our strategy is going to be to predict the time of the next PTS by simply measuring the time between the previous pts and this one.

方法就是预测下一帧的pts，通过测试前一帧的pts和当前帧的pts之差。

At the same time, we need to sync the video to the audio.

同时需要将视频同步到音频；

We're going to make an audio clock :

我们将要创建一个audio_clock；

an internal value that keeps track of what position the audio we're playing is at.

一个内部的值用来持续追踪音频播放的位置；

It's like the digital readout on any mp3 player.

就像MP3中的数字尺；

Since we're synching the video to the audio, the video thread uses this value to figure out if it's too far ahead or too far behind.

因此我们将视频同步到音频，视频线程使用这个值去确定当前播放太快或者太慢；

We'll get to the implementation later;

稍后我们会给出实现；

for now let's assume we have a get_audio_clock function that will give us the time on the audio clock.

现在我们只是假设我们有一个get_audio_clock函数，该函数可以给我们音频的时间；

Once we have that value, though, what do we do if the video and audio are out of sync?

一旦我们得到那个值，然而，如果音视频不同步我们该怎么办呢？

It would silly to simply try and leap to the correct packet through seeking or something.

只是简单的通过纠正包通过搜寻或其他什么方式。

Instead, we're just going to adjust the value we've calculated for the next refresh:

代替的，我们打算之前计算过的调整下次刷新的值：

if the PTS is too far behind the audio time, we double our calculated delay.

如果PTS落后于音频帧，我们把delay乘以以计算；

if the PTS is too far ahead of the audio time, we simply refresh as quickly as possible.

如果PTS太快超过音频帧的播放，我们就立即刷新；

Now that we have our adjusted refresh time, or delay , we're going to compare that with our computer's clock by keeping a running frame_timer .

既然我们有了刷新的时间或延迟，我们打算将其和计算机的时钟进行比较，该时钟我们使用frame_timer来维护；

This frame timer will sum up all of our calculated delays while playing the movie.

这个播放视频的时候，frame_timer会计算所有的delay；

In other words, this frame_timer is what time it should be when we display the next frame.

换句话说，frame_timer就是下一帧应该显示的时间；

We simply add the new delay to the frame timer, compare it to the time on our computer's clock, and use that value to schedule the next refresh.

我们只是简单的将新的delay添加到frame_timer中，并将其和系统时钟比较，并使用这个值去调度下一次刷新的时间；

This might be a bit confusing, so study the code carefully:

这可能有点难解，还是认真看代码吧：

void video_refresh_timer(void *userdata) {

  VideoState *is = (VideoState *)userdata;
  VideoPicture *vp;
  double actual_delay, delay, sync_threshold, ref_clock, diff;
  
  if(is->video_st) {
    if(is->pictq_size == 0) {
      schedule_refresh(is, 1);
    } else {
      vp = &is->pictq[is->pictq_rindex];

      delay = vp->pts - is->frame_last_pts; /* the pts from last time */
      if(delay <= 0 || delay >= 1.0) {
	/* if incorrect delay, use previous one */
	delay = is->frame_last_delay;
      }
      /* save for next time */
      is->frame_last_delay = delay;
      is->frame_last_pts = vp->pts;

      /* update delay to sync to audio */
      ref_clock = get_audio_clock(is);
      diff = vp->pts - ref_clock;

      /* Skip or repeat the frame. Take delay into account
	 FFPlay still doesn't "know if this is the best guess." */
      sync_threshold = (delay > AV_SYNC_THRESHOLD) ? delay : AV_SYNC_THRESHOLD;
      if(fabs(diff) < AV_NOSYNC_THRESHOLD) {
	if(diff <= -sync_threshold) {
	  delay = 0;
	} else if(diff >= sync_threshold) {
	  delay = 2 * delay;
	}
      }
      is->frame_timer += delay;
      /* computer the REAL delay */
      actual_delay = is->frame_timer - (av_gettime() / 1000000.0);
      if(actual_delay < 0.010) {
	/* Really it should skip the picture instead */
	actual_delay = 0.010;
      }
      schedule_refresh(is, (int)(actual_delay * 1000 + 0.5));
      /* show the picture! */
      video_display(is);
      
      /* update queue for next picture! */
      if(++is->pictq_rindex == VIDEO_PICTURE_QUEUE_SIZE) {
	is->pictq_rindex = 0;
      }
      SDL_LockMutex(is->pictq_mutex);
      is->pictq_size--;
      SDL_CondSignal(is->pictq_cond);
      SDL_UnlockMutex(is->pictq_mutex);
    }
  } else {
    schedule_refresh(is, 100);
  }
}

There are a few checks we make:

这里有几个需要检查的地方：

first, we make sure that the delay between the PTS and the previous PTS make sense.

首先，需要确保当前的PTS和前一个PTS有意义；

If it doesn't we just guess and use the last delay.

如果没有，需要根据上次的delay进行猜测；

Next, we make sure we have a synch threshold because things are never going to be perfectly in synch.

下一步，我们需要一个同步的阀值，因为完美的同步是不存在的；

ffplay uses 0.01 for its value.

ffplay将该值设为0.01；

We also make sure that the synch threshold is never smaller than the gaps in between PTS values.

也要确保同步的阀值不能小于两个PTS之间的值；

Finally, we make the minimum refresh value 10 milliseconds*.

最后，我们设置最小的刷新阀值为10微妙；

We added a bunch of variables to the big struct so don't forget to check the code.

我们也添加了一段代码到大结构里面，不要忘了检查；

Also, don't forget to initialize the frame timer and the initial previous frame delay in stream_component_open:

同时，不要忘了初始化frame_timer并检查之前在stream_component_open里的延迟；

is->frame_timer = (double)av_gettime() / 1000000.0;
is->frame_last_delay = 40e-3;

Synching: The Audio Clock

Now it's time for us to implement the audio clock.

现在是实现音频时钟的时候了；

We can update the clock time in our audio_decode_frame function, which is where we decode the audio.

我们可以在audio_decode_frame里面更新时钟，就是解码音频的地方；

Now, remember that we don't always process a new packet every time we call this function, so there are two places we have to update the clock at.

但是要记住，并不是每次调用这个函数的时候都会处理一个新包的，因此有2个地方我们必须要更新时钟；

The first place is where we get the new packet: we simply set the audio clock to the packet's PTS.

第一个地方就是获取新包的时候，我们只是简单的把音频实在设置为包的PTS；

Then if a packet has multiple frames, we keep time the audio play by counting the number of samples and multiplying them by the given samples-per-second rate.

如果一个packet有多个frame，。。。。

So once we have the packet:

    /* if update, update the audio clock w/pts */
    if(pkt->pts != AV_NOPTS_VALUE) {
      is->audio_clock = av_q2d(is->audio_st->time_base)*pkt->pts;
    }

And once we are processing the packet:

      /* Keep audio_clock up-to-date */
      pts = is->audio_clock;
      *pts_ptr = pts;
      n = 2 * is->audio_st->codec->channels;
      is->audio_clock += (double)data_size /
	(double)(n * is->audio_st->codec->sample_rate);

A few fine details: the template of the function has changed to include pts_ptr , so make sure you change that.

pts_ptr is a pointer we use to inform audio_callback the pts of the audio packet.

我们使用pts_ptr通知audio_callback音频包的pts

This will be used next time for synchronizing the audio with the video.

我们使用这个pts_ptr将视频同步到音频

Now we can finally implement our get_audio_clock function.

现在我们可以实现我们的get_audio_clock函数了；

It's not as simple as getting the is->audio_clock value, thought.

并不是简单的获取is->audio_clock就可以了的；

Notice that we set the audio PTS every time we process it, but if you look at the audio_callback function, it takes time to move all the data from our audio packet into our output buffer.

注意我们每次处理音频的时候都会设置音频的PTS，但是如果你看看audio_callback函数，

That means that the value in our audio clock could be too far ahead.

So we have to check how much we have left to write.

Here's the complete code:

double get_audio_clock(VideoState *is) {
  double pts;
  int hw_buf_size, bytes_per_sec, n;
  
  pts = is->audio_clock; /* maintained in the audio thread */
  hw_buf_size = is->audio_buf_size - is->audio_buf_index;
  bytes_per_sec = 0;
  n = is->audio_st->codec->channels * 2;
  if(is->audio_st) {
    bytes_per_sec = is->audio_st->codec->sample_rate * n;
  }
  if(bytes_per_sec) {
    pts -= (double)hw_buf_size / bytes_per_sec;
  }
  return pts;
}

完整的代码：

#include "stdafx.h"

#ifdef TUTORIAL_05
// tutorial05.c
// A pedagogical video player that really works!
//
// This tutorial was written by Stephen Dranger (dranger@gmail.com).
//
// Code based on FFplay, Copyright (c) 2003 Fabrice Bellard, 
// and a tutorial by Martin Bohme (boehme@inb.uni-luebeckREMOVETHIS.de)
// Tested on Gentoo, CVS version 5/01/07 compiled with GCC 4.1.1
//
// Use the Makefile to build all the samples.
//
// Run using
// tutorial05 myvideofile.mpg
//
// to play the video.


extern "C"
{
#include "libavutil/avstring.h"
#include "libavutil/mathematics.h"
#include "libavutil/pixdesc.h"
#include "libavutil/imgutils.h"
#include "libavutil/dict.h"
#include "libavutil/parseutils.h"
#include "libavutil/samplefmt.h"
#include "libavutil/avassert.h"
#include "libavutil/time.h"
#include "libavformat/avformat.h"
#include "libavdevice/avdevice.h"
#include "libswscale/swscale.h"
#include "libavutil/opt.h"
#include "libavcodec/avfft.h"
#include "libswresample/swresample.h"

#include "SDL1.2/SDL.h"
#include "SDL1.2/SDL_thread.h"
}

#pragma comment(lib, "avcodec.lib")
#pragma comment(lib, "avformat.lib")
#pragma comment(lib, "avutil.lib")
#pragma comment(lib, "avdevice.lib")
#pragma comment(lib, "avfilter.lib")
#pragma comment(lib, "postproc.lib")
#pragma comment(lib, "swresample.lib")
#pragma comment(lib, "swscale.lib")
#pragma comment(lib, "SDL.lib")

#ifdef __MINGW32__
#undef main /* Prevents SDL from overriding main() */
#endif

#include <stdio.h>
#include <math.h>

#define SDL_AUDIO_BUFFER_SIZE               1024
#define MAX_AUDIO_FRAME_SIZE                192000

#define MAX_AUDIOQ_SIZE                     (5 * 16 * 1024)
#define MAX_VIDEOQ_SIZE                     (5 * 256 * 1024)

#define AV_SYNC_THRESHOLD                   0.01
#define AV_NOSYNC_THRESHOLD                 10.0

#define FF_ALLOC_EVENT                      (SDL_USEREVENT)
#define FF_REFRESH_EVENT                    (SDL_USEREVENT + 1)
#define FF_QUIT_EVENT                       (SDL_USEREVENT + 2)

#define VIDEO_PICTURE_QUEUE_SIZE            1


// BD
int         g_iIndex_video_pkt = 0;
// ED

typedef struct PacketQueue {
    AVPacketList *first_pkt, *last_pkt;
    int nb_packets;
    int size;
    SDL_mutex *mutex;
    SDL_cond *cond;
} PacketQueue;


typedef struct VideoPicture {
    SDL_Overlay *bmp;
    int width, height; /* source height & width */
    int allocated;
    double pts;

    // BD
    AVPictureType type;
    int iIndex;
    // ED
} VideoPicture;

typedef struct VideoState {

    AVFormatContext *pFormatCtx;
    int             videoStream, audioStream;

    // audio
    double          audio_clock;
    AVStream        *audio_st;
    PacketQueue     audioq;
    AVFrame         audio_frame;
    uint8_t         audio_buf[(MAX_AUDIO_FRAME_SIZE * 3) / 2];
    unsigned int    audio_buf_size;
    unsigned int    audio_buf_index;
    AVPacket        audio_pkt;
    uint8_t         *audio_pkt_data;
    int             audio_pkt_size;
    int             audio_hw_buf_size;
    double          frame_timer;
    double          frame_last_pts;
    double          frame_last_delay;

    // video
    double          video_clock; ///<pts of last decoded frame / predicted pts of next decoded frame
    AVStream        *video_st;
    PacketQueue     videoq;

    VideoPicture    pictq[VIDEO_PICTURE_QUEUE_SIZE];
    int             pictq_size, pictq_rindex, pictq_windex;
    SDL_mutex       *pictq_mutex;
    SDL_cond        *pictq_cond;

    SDL_Thread      *parse_tid;
    SDL_Thread      *video_tid;

    char            filename[1024];
    int             quit;

    AVIOContext     *io_context;
    struct SwsContext *sws_ctx;
} VideoState;

SDL_Surface     *screen;

/* Since we only have one decoding thread, the Big Struct
can be global in case we need it. */
VideoState *global_video_state;

struct SwrContext *swr_ctx;
DECLARE_ALIGNED(16, uint8_t, audio_buf2)[MAX_AUDIO_FRAME_SIZE * 4];

static inline double rint(double x)
{
    return x >= 0 ? floor(x + 0.5) : ceil(x - 0.5);
}

void packet_queue_init(PacketQueue *q) {
    memset(q, 0, sizeof(PacketQueue));
    q->mutex = SDL_CreateMutex();
    q->cond = SDL_CreateCond();
}

int packet_queue_put(PacketQueue *q, AVPacket *pkt) {
    AVPacketList *pkt1;
    if( av_dup_packet(pkt) < 0 ) {
        return -1;
    }
    pkt1 = (AVPacketList *)av_malloc(sizeof(AVPacketList));
    if( !pkt1 ) {
        return -1;
    }

    pkt1->pkt = *pkt;
    pkt1->next = NULL;

    SDL_LockMutex(q->mutex);

    if( !q->last_pkt ) {
        q->first_pkt = pkt1;
    } else {
        q->last_pkt->next = pkt1;
    }

    q->last_pkt = pkt1;
    q->nb_packets ++;
    q->size += pkt1->pkt.size;
    SDL_CondSignal(q->cond);

    SDL_UnlockMutex(q->mutex);
    return 0;
}
static int packet_queue_get(PacketQueue *q, AVPacket *pkt, int block)
{
    AVPacketList *pkt1;
    int ret;

    SDL_LockMutex(q->mutex);

    for( ; ; ) {
        if( global_video_state->quit ) {
            ret = -1;
            break;
        }

        pkt1 = q->first_pkt;
        if( pkt1 ) {
            q->first_pkt = pkt1->next;
            if( !q->first_pkt ) {
                q->last_pkt = NULL;
            }

            q->nb_packets --;
            q->size -= pkt1->pkt.size;
            *pkt = pkt1->pkt;
            av_free(pkt1);
            ret = 1;
            break;
        } else if( !block ) {
            ret = 0;
            break;
        } else {
            SDL_CondWait(q->cond, q->mutex);
        }
    }

    SDL_UnlockMutex(q->mutex);
    return ret;
}

double get_audio_clock(VideoState *is)
{
    double pts;
    int hw_buf_size, bytes_per_sec, n;

    // 当前音频buffer播放完的时间
    pts = is->audio_clock; /* maintained in the audio thread */
    // 当前音频buffer的剩余时间
    hw_buf_size = is->audio_buf_size - is->audio_buf_index;
    bytes_per_sec = 0;

    // 计算音频1秒钟所需的数据量
    n = is->audio_st->codec->channels * 2;
    if( is->audio_st ) {
        bytes_per_sec = is->audio_st->codec->sample_rate * n;
    }

    // (double)hw_buf_size / bytes_per_sec;为当前音频播放完还需要的时间
    // pts减去上面的值得到当前的时间戳
    if( bytes_per_sec ) {
        pts -= (double)hw_buf_size / bytes_per_sec;
    }
    return pts;
}

int audio_decode_frame(VideoState *is, double *pts_ptr)
{
    int len1, data_size = 0, n;
    AVPacket *pkt = &is->audio_pkt;
    double pts;

    for( ; ; ) {
        while( is->audio_pkt_size > 0 ) {
            int got_frame;
            len1 = avcodec_decode_audio4(is->audio_st->codec, &is->audio_frame, &got_frame, pkt);
            if( len1 < 0 ) {
                /* if error, skip frame */
                is->audio_pkt_size = 0;
                break;
            }

            if( got_frame ) {
                AVCodecContext* aCodecCtx = is->audio_st->codec;

                uint64_t dec_channel_layout =
                    (aCodecCtx->channel_layout && aCodecCtx->channels == av_get_channel_layout_nb_channels(aCodecCtx->channel_layout)) ?
                    aCodecCtx->channel_layout : av_get_default_channel_layout(aCodecCtx->channels);

                AVSampleFormat tgtFmt = AV_SAMPLE_FMT_S16;
                if( aCodecCtx->sample_fmt != tgtFmt ) {
                    // 需要重采样
                    if( swr_ctx == NULL ) {
                        swr_ctx = swr_alloc();
                        swr_ctx = swr_alloc_set_opts(swr_ctx,
                            dec_channel_layout, tgtFmt, aCodecCtx->sample_rate,
                            dec_channel_layout, aCodecCtx->sample_fmt, aCodecCtx->sample_rate, 0, NULL);

                        if( !swr_ctx || swr_init(swr_ctx) < 0 ) {
                            assert(false);
                        }
                    }

                    if( swr_ctx ) {
                        const uint8_t **in = (const uint8_t **)is->audio_frame.extended_data;
                        uint8_t *out[] = {audio_buf2};
                        int out_count = sizeof(audio_buf2) / aCodecCtx->channels / av_get_bytes_per_sample(aCodecCtx->sample_fmt);

                        int len2 = swr_convert(swr_ctx, out, out_count, in, is->audio_frame.nb_samples);
                        if( len2 < 0 ) {
                            LogPrintfA("swr_convert() failed\n");
                            break;
                        }
                        if( len2 == out_count ) {
                            LogPrintfA("warning: audio buffer is probably too small\n");
                            swr_init(swr_ctx);
                        }

                        data_size = len2 * aCodecCtx->channels * av_get_bytes_per_sample(tgtFmt);
                        memcpy(is->audio_buf, audio_buf2, data_size);
                    }
                } else {
                    // 不需要重采样
                    data_size = av_samples_get_buffer_size(NULL,
                        aCodecCtx->channels,
                        is->audio_frame.nb_samples,
                        aCodecCtx->sample_fmt,
                        1);
                    assert(data_size <= is->audio_buf_size);
                    memcpy(is->audio_buf, is->audio_frame.data[0], data_size);
                }
            }
            is->audio_pkt_data += len1;
            is->audio_pkt_size -= len1;
            if( data_size <= 0 ) {
                /* No data yet, get more frames */
                continue;
            }

            pts = is->audio_clock;
            *pts_ptr = pts;
            // 2为： 16位采样, 一次占用的字节数, 若非16位采样, 就要修改字节数了
            // 这里是为了计算播放本次音频buffer所需的时间
            n = 2 * is->audio_st->codec->channels;
            is->audio_clock += (double)data_size /
                (double)(n * is->audio_st->codec->sample_rate);
            
            //LogPrintf(_T("is->audio_clock: %f, plus: %f\n"), is->audio_clock, (double)data_size / (double)(n * is->audio_st->codec->sample_rate) );
            
            /* We have data, return it and come back for more later */
            return data_size;
        }
        if( pkt->data ) {
            av_free_packet(pkt);
        }

        if( is->quit ) {
            return -1;
        }
        /* next packet */
        if( packet_queue_get(&is->audioq, pkt, 1) < 0 ) {
            return -1;
        }

        is->audio_pkt_data = pkt->data;
        is->audio_pkt_size = pkt->size;
        /* if update, update the audio clock w/pts */
        if( pkt->pts != AV_NOPTS_VALUE ) {
            is->audio_clock = av_q2d(is->audio_st->time_base)*pkt->pts;
        }
    }
}

void audio_callback(void *userdata, Uint8 *stream, int len)
{
    VideoState *is = (VideoState *)userdata;
    int len1, audio_size;
    double pts;

    while( len > 0 ) {
        if(is->audio_buf_index >= is->audio_buf_size) {
            /* We have already sent all our data; get more */
            audio_size = audio_decode_frame(is, &pts);
            if( audio_size < 0 ) {
                /* If error, output silence */
                is->audio_buf_size = 1024;
                memset(is->audio_buf, 0, is->audio_buf_size);
            } else {
                is->audio_buf_size = audio_size;
            }
            is->audio_buf_index = 0;
        }
        len1 = is->audio_buf_size - is->audio_buf_index;
        if( len1 > len ) {
            len1 = len;
        }
        memcpy(stream, (uint8_t *)is->audio_buf + is->audio_buf_index, len1);
        len -= len1;
        stream += len1;
        is->audio_buf_index += len1;
    }
}

static Uint32 sdl_refresh_timer_cb(Uint32 interval, void *opaque)
{
    SDL_Event event;
    event.type = FF_REFRESH_EVENT;
    event.user.data1 = opaque;
    SDL_PushEvent(&event);
    return 0; /* 0 means stop timer */
}

/* schedule a video refresh in 'delay' ms */
static void schedule_refresh(VideoState *is, int delay)
{
    SDL_AddTimer(delay, sdl_refresh_timer_cb, is);
}

void video_display(VideoState *is)
{
    SDL_Rect rect;
    VideoPicture *vp;
    //AVPicture pict;
    float aspect_ratio;
    int w, h, x, y;
    //int i;

    vp = &is->pictq[is->pictq_rindex];
    if( vp->bmp ) {
        if(is->video_st->codec->sample_aspect_ratio.num == 0) {
            aspect_ratio = 0;
        } else {
            aspect_ratio = av_q2d(is->video_st->codec->sample_aspect_ratio) *
                is->video_st->codec->width / is->video_st->codec->height;
        }
        if( aspect_ratio <= 0.0 ) {
            aspect_ratio = (float)is->video_st->codec->width /
                (float)is->video_st->codec->height;
        }
        h = screen->h;
        w = ((int)rint(h * aspect_ratio)) & -3;
        if( w > screen->w ) {
            w = screen->w;
            h = ((int)rint(w / aspect_ratio)) & -3;
        }
        x = (screen->w - w) / 2;
        y = (screen->h - h) / 2;

        rect.x = x;
        rect.y = y;
        rect.w = w;
        rect.h = h;
         
        // BD
        //LogPrintfA("---------------------------------------------------------- [%05d] refresh bmp, Packet:%d, type: %s, pts: %f\n",
        //    ::GetCurrentThreadId(), vp->iIndex, GetPictureTypeString(vp->type).c_str(), vp->pts);
        // ED

        SDL_DisplayYUVOverlay(vp->bmp, &rect);
    }
}

void video_refresh_timer(void *userdata)
{
    VideoState *is = (VideoState *)userdata;
    VideoPicture *vp;
    double actual_delay, delay, sync_threshold, ref_clock, diff;

    if( is->video_st ) {
        if( is->pictq_size == 0 ) {
            schedule_refresh(is, 1);
        } else {
            // 目标: 计算下一帧图像的显示时间
            vp = &is->pictq[is->pictq_rindex];
            
            // frame_last_pts存着上一帧图像的pts, 用当前帧的pts减去上一帧的pts, 从而计算出一个估计的delay值
            // 该delay值是上一帧图像已播放的时长
            delay = vp->pts - is->frame_last_pts; /* the pts from last time */

            // BD
            static int iIndex = 0;
            //LogPrintfA("上一帧播放时长为: %f\n", delay);
            // ED
            // 这个delay值有一个范围，如果超出范围的话，则用再上一次的delay值
            if( delay <= 0 || delay >= 1.0 ) {
                /* if incorrect delay, use previous one */
                delay = is->frame_last_delay;
            }

            /* save for next time */
            is->frame_last_delay = delay;
            // 将当前帧的pts保存下来
            is->frame_last_pts = vp->pts;
            
            /* update delay to sync to audio */
            // ref_clock: audio播放的时间戳
            ref_clock = get_audio_clock(is);
            diff = vp->pts - ref_clock;
            
            // BD
            //LogPrintfA("vp->pts: %f, ref_clock: %f, diff: %f; delay: %f\n", vp->pts, ref_clock, diff, delay);
            // ED
            
            /* Skip or repeat the frame. Take delay into account
            FFPlay still doesn't "know if this is the best guess." */
            // delay和AV_SYNC_THRESHOLD之间取一个最大值
            // new
            sync_threshold = FFMAX(delay, AV_SYNC_THRESHOLD);
            // 时间正负在(-0.01, 0.01)范围之外需要重新计算延迟
            if( fabs(diff) < AV_NOSYNC_THRESHOLD ) {
                if( diff <= -sync_threshold ) { // 如果diff是个很小的负数，则说明当前视频帧已经落后于主时钟源了，下一帧图像应该快点显示，所以delay=0
                    delay = 0;
                } else if( diff >= sync_threshold ) { // 如果diff是一个比较大的正数，则说明当前视频帧已经超前于主时钟源了，下一帧图像应该延迟显示
                    delay = 2 * delay;
                } else {
                    // diff是个可接受的数值, 可直接使用上一个delay
                    // LogPrintfA("abcd\n");
                }
            } else {
                assert(false);
            }

            // BD
            double frame_timer_old = is->frame_timer;
            // ED

            // frame_timer是一个delay累加的值, 加上delay后, frame_timer即为下一帧图像开始显示的时间
            is->frame_timer += delay;
            /* computer the REAL delay */
            // frame_timer减去当前系统时钟，得到一个actual_delay值
            actual_delay = is->frame_timer - (av_gettime() / 1000000.0);
            if( actual_delay < 0.010 ) {
                /* Really it should skip the picture instead */
                actual_delay = 0.010;
            }
            schedule_refresh(is, (int)(actual_delay * 1000 + 0.5));
            
            /* show the picture! */
            video_display(is);

            /* update queue for next picture! */
            if( ++ is->pictq_rindex == VIDEO_PICTURE_QUEUE_SIZE ) {
                is->pictq_rindex = 0;
            }
            SDL_LockMutex(is->pictq_mutex);
            is->pictq_size--;
            SDL_CondSignal(is->pictq_cond);
            SDL_UnlockMutex(is->pictq_mutex);
        }
    } else {
        schedule_refresh(is, 100);
    }
}

void alloc_picture(void *userdata) {
    VideoState *is = (VideoState *)userdata;
    VideoPicture *vp;

    vp = &is->pictq[is->pictq_windex];
    if( vp->bmp ) {
        // we already have one make another, bigger/smaller
        SDL_FreeYUVOverlay(vp->bmp);
    }

    // Allocate a place to put our YUV image on that screen
    vp->bmp = SDL_CreateYUVOverlay(is->video_st->codec->width,
        is->video_st->codec->height,
        SDL_YV12_OVERLAY,
        screen);
    vp->width = is->video_st->codec->width;
    vp->height = is->video_st->codec->height;

    SDL_LockMutex(is->pictq_mutex);
    vp->allocated = 1;
    SDL_CondSignal(is->pictq_cond);
    SDL_UnlockMutex(is->pictq_mutex);
}

int queue_picture(VideoState *is, AVFrame *pFrame, double pts, int iIndex)
{
    VideoPicture *vp;
    AVPicture pict;

    /* wait until we have space for a new pic */
    SDL_LockMutex(is->pictq_mutex);
    while( is->pictq_size >= VIDEO_PICTURE_QUEUE_SIZE && !is->quit ) {
        SDL_CondWait(is->pictq_cond, is->pictq_mutex);
    }
    SDL_UnlockMutex(is->pictq_mutex);

    if( is->quit ) {
        return -1;
    }

    // windex is set to 0 initially
    vp = &is->pictq[is->pictq_windex];

    /* allocate or resize the buffer! */
    if( !vp->bmp ||
        vp->width != is->video_st->codec->width ||
        vp->height != is->video_st->codec->height ) {
            SDL_Event event;

            vp->allocated = 0;
            /* we have to do it in the main thread */
            event.type = FF_ALLOC_EVENT;
            event.user.data1 = is;
            SDL_PushEvent(&event);

            /* wait until we have a picture allocated */
            SDL_LockMutex(is->pictq_mutex);
            while( !vp->allocated && !is->quit ) {
                SDL_CondWait(is->pictq_cond, is->pictq_mutex);
            }
            SDL_UnlockMutex(is->pictq_mutex);
            if( is->quit ) {
                return -1;
            }
    }

    /* We have a place to put our picture on the queue */
    /* If we are skipping a frame, do we set this to null
    but still return vp->allocated = 1? */

    if( vp->bmp ) {
        SDL_LockYUVOverlay(vp->bmp);

        /* point pict at the queue */

        pict.data[0] = vp->bmp->pixels[0];
        pict.data[1] = vp->bmp->pixels[2];
        pict.data[2] = vp->bmp->pixels[1];

        pict.linesize[0] = vp->bmp->pitches[0];
        pict.linesize[1] = vp->bmp->pitches[2];
        pict.linesize[2] = vp->bmp->pitches[1];

        // Convert the image into YUV format that SDL uses
        sws_scale
            (
            is->sws_ctx,
            (uint8_t const * const *)pFrame->data,
            pFrame->linesize,
            0,
            is->video_st->codec->height,
            pict.data,
            pict.linesize
            );

        SDL_UnlockYUVOverlay(vp->bmp);
        vp->pts = pts;

        // BD
        vp->type = pFrame->pict_type;
        vp->iIndex = iIndex;
        // ED

        /* now we inform our display thread that we have a pic ready */
        if( ++ is->pictq_windex == VIDEO_PICTURE_QUEUE_SIZE ) {
            is->pictq_windex = 0;
        }
        SDL_LockMutex(is->pictq_mutex);
        is->pictq_size++;
        SDL_UnlockMutex(is->pictq_mutex);
    }

    return 0;
}

/*
 * 这里就是简单的计算video_clock的值
 */
double synchronize_video(VideoState *is, AVFrame *src_frame, double pts)
{
    double frame_delay;

    if( pts != 0 ) {
        /* if we have pts, set video clock to it */
        is->video_clock = pts;
    } else {
        /* if we aren't given a pts, set it to the clock */
        pts = is->video_clock;
    }

    /* update the video clock */
    // 若视频帧率为25fps, 则1帧耗时0.04s, 而这里time_base的值为1/50, 即0.02秒
    frame_delay = av_q2d(is->video_st->codec->time_base);

    /* if we are repeating a frame, adjust clock accordingly */
    frame_delay += src_frame->repeat_pict * (frame_delay * 0.5);

    is->video_clock += frame_delay;
    
    return pts;
}

uint64_t global_video_pkt_pts = AV_NOPTS_VALUE;

/* These are called whenever we allocate a frame
* buffer. We use this to store the global_pts in
* a frame at the time it is allocated.
*/
int our_get_buffer(struct AVCodecContext *c, AVFrame *pic, int flags)
{
    int ret = avcodec_default_get_buffer(c, pic);
    uint64_t *pts = (uint64_t *)av_malloc(sizeof(uint64_t));
    *pts = global_video_pkt_pts;
    pic->opaque = pts;

    return ret;
}
void our_release_buffer(struct AVCodecContext *c, AVFrame *pic)
{
    if( pic ) {
        av_freep(&pic->opaque);
    }

    avcodec_default_release_buffer(c, pic);
}

int video_thread(void *arg)
{
    VideoState *is = (VideoState *)arg;
    AVPacket pkt1, *packet = &pkt1;
    int frameFinished;
    AVFrame *pFrame;
    double pts;

    pFrame = av_frame_alloc();

    for( ; ; ) {
        if( packet_queue_get(&is->videoq, packet, 1) < 0 ) {
            // means we quit getting packets
            break;
        }
        pts = 0;

        // Save global pts to be stored in pFrame in first call
        global_video_pkt_pts = packet->pts;

        // Decode video frame
        int iRet = avcodec_decode_video2(is->video_st->codec, pFrame, &frameFinished, packet);
        if( iRet < 0 ) {
            // error
            int a=2;
            int b=a;
        } else if( iRet == 0 ) {
            // no frame could be decompressed
            int a=2;
            int b=a;
        } else {
            // ok
        }

        // BD
        LogPrintfA("[%05d] Packet:%d, type: %s, dts: %I64d, pts: %I64d\n", ::GetCurrentThreadId(),
                ++ g_iIndex_video_pkt, GetPictureTypeString(pFrame->pict_type).c_str(),
                packet->dts, packet->pts);
        // ED

        if( packet->dts == AV_NOPTS_VALUE
            && pFrame->opaque
            && *(uint64_t*)pFrame->opaque != AV_NOPTS_VALUE ) {
                pts = *(uint64_t *)pFrame->opaque;
        } else if( packet->dts != AV_NOPTS_VALUE ) {
            pts = packet->dts;
        } else {
            pts = 0;
        }
        // 根据pts来计算一桢在整个视频中的时间位置
        pts *= av_q2d(is->video_st->time_base);
        
        // BD
        AVRational a1 = is->video_st->r_frame_rate;
        int64_t ptsBst = av_frame_get_best_effort_timestamp(pFrame);
        double ptsOld = pts;
        if( AV_PICTURE_TYPE_I == pFrame->pict_type ) {
            int a=2;
            int b=a;
        }
        // ED

        // Did we get a video frame?
        if( frameFinished ) {
            pts = synchronize_video(is, pFrame, pts);

            // BD
            if( ptsOld != pts ) {
                int a=2;
                int b=a;
            }
            //LogPrintfA("[%05d] Packet:%d, truely pts: %f\n", ::GetCurrentThreadId(), g_iIndex_video_pkt, pts);
            // ED
            
            if( queue_picture(is, pFrame, pts, g_iIndex_video_pkt) < 0 ) {
                break;
            }
        }
        av_free_packet(packet);
    }

    av_free(pFrame);
    return 0;
}

int stream_component_open(VideoState *is, int stream_index)
{
    AVFormatContext *pFormatCtx = is->pFormatCtx;
    AVCodecContext *codecCtx = NULL;
    AVCodec *codec = NULL;
    AVDictionary *optionsDict = NULL;
    SDL_AudioSpec wanted_spec, spec;

    if(stream_index < 0 || stream_index >= pFormatCtx->nb_streams) {
        return -1;
    }

    // Get a pointer to the codec context for the video stream
    codecCtx = pFormatCtx->streams[stream_index]->codec;

    if( codecCtx->codec_type == AVMEDIA_TYPE_AUDIO ) {
        // Set audio settings from codec info
        wanted_spec.freq = codecCtx->sample_rate;
        wanted_spec.format = AUDIO_S16SYS;
        wanted_spec.channels = codecCtx->channels;
        wanted_spec.silence = 0;
        wanted_spec.samples = SDL_AUDIO_BUFFER_SIZE;
        wanted_spec.callback = audio_callback;
        wanted_spec.userdata = is;

        if( SDL_OpenAudio(&wanted_spec, &spec) < 0 ) {
            fprintf(stderr, "SDL_OpenAudio: %s\n", SDL_GetError());
            return -1;
        }
        is->audio_hw_buf_size = spec.size;
    }
    codec = avcodec_find_decoder(codecCtx->codec_id);

    if( !codec || (avcodec_open2(codecCtx, codec, &optionsDict) < 0) ) {
        fprintf(stderr, "Unsupported codec!\n");
        return -1;
    }

    switch( codecCtx->codec_type ) {
    case AVMEDIA_TYPE_AUDIO:
        {
            is->audioStream = stream_index;
            is->audio_st = pFormatCtx->streams[stream_index];
            is->audio_buf_size = 0;
            is->audio_buf_index = 0;
            memset(&is->audio_pkt, 0, sizeof(is->audio_pkt));
            packet_queue_init(&is->audioq);
            SDL_PauseAudio(0);
        }
        break;
    case AVMEDIA_TYPE_VIDEO:
        {
            is->videoStream = stream_index;
            is->video_st = pFormatCtx->streams[stream_index];

            is->frame_timer = (double)av_gettime() / 1000000.0;
            is->frame_last_delay = 40e-3;

            // BD
            LogPrintfA("初始化: frame_timer: %f, frame_last_delay: %f\n", is->frame_timer, is->frame_last_delay);
            // ED

            packet_queue_init(&is->videoq);
            is->video_tid = SDL_CreateThread(video_thread, is);
            is->sws_ctx =
                sws_getContext
                (
                is->video_st->codec->width,
                is->video_st->codec->height,
                is->video_st->codec->pix_fmt,
                is->video_st->codec->width,
                is->video_st->codec->height,
                PIX_FMT_YUV420P,
                SWS_BILINEAR,
                NULL,
                NULL,
                NULL
                );
            codecCtx->get_buffer2 = our_get_buffer;
            codecCtx->release_buffer = our_release_buffer;
        }
        break;
    default:
        break;
    }

    return 0;
}

int decode_interrupt_cb(void *opaque) {
    return (global_video_state && global_video_state->quit);
}

int decode_thread(void *arg)
{
    VideoState *is = (VideoState *)arg;
    AVFormatContext *pFormatCtx = NULL;
    AVPacket pkt1, *packet = &pkt1;

    AVDictionary *io_dict = NULL;
    AVIOInterruptCB callback;

    int video_index = -1;
    int audio_index = -1;
    int i;

    is->videoStream = -1;
    is->audioStream = -1;

    global_video_state = is;
    // will interrupt blocking functions if we quit!
    callback.callback = decode_interrupt_cb;
    callback.opaque = is;
    if( avio_open2(&is->io_context, is->filename, 0, &callback, &io_dict) ) {
        fprintf(stderr, "Unable to open I/O for %s\n", is->filename);
        return -1;
    }

    // Open video file
    if( avformat_open_input(&pFormatCtx, is->filename, NULL, NULL) != 0 ) {
        return -1; // Couldn't open file
    }

    is->pFormatCtx = pFormatCtx;

    // Retrieve stream information
    if( avformat_find_stream_info(pFormatCtx, NULL) < 0 ) {
        return -1; // Couldn't find stream information
    }

    // Dump information about file onto standard error
    av_dump_format(pFormatCtx, 0, is->filename, 0);

    // Find the first video stream
    for( i = 0; i < pFormatCtx->nb_streams; i++ ) {
        if( pFormatCtx->streams[i]->codec->codec_type == AVMEDIA_TYPE_VIDEO &&
            video_index < 0 ) {
                video_index = i;
        }
        if( pFormatCtx->streams[i]->codec->codec_type == AVMEDIA_TYPE_AUDIO &&
            audio_index < 0 ) {
                audio_index = i;
        }
    }
    if( audio_index >= 0 ) {
        stream_component_open(is, audio_index);
    }
    if( video_index >= 0 ) {
        stream_component_open(is, video_index);
    }

    if( is->videoStream < 0 || is->audioStream < 0 ) {
        fprintf(stderr, "%s: could not open codecs\n", is->filename);
        goto fail;
    }

    // Begin -- set video size by oldmtn
    // Make a screen to put our video
    int width = pFormatCtx->streams[video_index]->codec->width;
    int height = pFormatCtx->streams[video_index]->codec->height;
    screen = SDL_SetVideoMode(width, height, 0, 0);
    if( !screen ) {
        fprintf(stderr, "SDL: could not set video mode - exiting\n");
        exit(1);
    }
    // End -- set video size by oldmtn

    // main decode loop

    for( ; ; ) {
        if( is->quit ) {
            break;
        }

        // seek stuff goes here
        if( is->audioq.size > MAX_AUDIOQ_SIZE ||
            is->videoq.size > MAX_VIDEOQ_SIZE ) {
                SDL_Delay(10);
                continue;
        }

        if( av_read_frame(is->pFormatCtx, packet) < 0 ) {
            if( is->pFormatCtx->pb->error == 0 ) {
                SDL_Delay(100); /* no error; wait for user input */
                continue;
            } else {
                break;
            }
        }

        // Is this a packet from the video stream?
        if( packet->stream_index == is->videoStream ) {
            packet_queue_put(&is->videoq, packet);
        } else if( packet->stream_index == is->audioStream ) {
            packet_queue_put(&is->audioq, packet);
        } else {
            av_free_packet(packet);
        }
    }

    /* all done - wait for it */
    while( !is->quit ) {
        SDL_Delay(100);
    }

fail:
    {
        SDL_Event event;
        event.type = FF_QUIT_EVENT;
        event.user.data1 = is;
        SDL_PushEvent(&event);
    }
    return 0;
}

int _tmain() {

    SDL_Event       event;

    VideoState      *is;

    is = (VideoState *)av_mallocz(sizeof(VideoState));

    //char szFile[] = "cuc_ieschool.flv";
    char szFile[] = "edu.flv";
    //char szFile[] = "song.flv";
    //char szFile[] = "drj.mkv";
    //char szFile[] = "city.mkv";

    // Register all formats and codecs
    av_register_all();

    if(SDL_Init(SDL_INIT_VIDEO | SDL_INIT_AUDIO | SDL_INIT_TIMER)) {
        fprintf(stderr, "Could not initialize SDL - %s\n", SDL_GetError());
        exit(1);
    }

    av_strlcpy(is->filename, szFile, 1024);

    is->pictq_mutex = SDL_CreateMutex();
    is->pictq_cond = SDL_CreateCond();

    schedule_refresh(is, 40);

    is->parse_tid = SDL_CreateThread(decode_thread, is);
    if(!is->parse_tid) {
        av_free(is);
        return -1;
    }

    for( ; ; ) {
        SDL_WaitEvent(&event);
        switch(event.type) {
        case FF_QUIT_EVENT:
        case SDL_QUIT:
            is->quit = 1;
            /*
            * If the video has finished playing, then both the picture and
            * audio queues are waiting for more data.  Make them stop
            * waiting and terminate normally.
            */
            SDL_CondSignal(is->audioq.cond);
            SDL_CondSignal(is->videoq.cond);
            SDL_Quit();
            exit(0);
            break;
        case FF_ALLOC_EVENT:
            alloc_picture(event.user.data1);
            break;
        case FF_REFRESH_EVENT:
            video_refresh_timer(event.user.data1);
            break;
        default:
            break;
        }
    }

    return 0;
}


#endif // TUTORIAL_05

PTS和DTS

接触FFMPEG应用程序时间不长，一共8个tutorial，现在看到了第5个，花的时间最长，理解也是最难的。里面首先把电影文件分为audio和video，其中每个packet都有相应的pts，audio是通过声卡时钟自动同步，audio的pts的作用是来同步视频的。

audio和video都有一个统计播放总时间的变量，即audio_clock和video_clock，ffmpeg-tutorial05就是通过比较这两个clock来调整当前视频帧的延迟时间，从而达到音视频同步的效果的。

幸运的是，音频和视频流都有一些关于以多快速度和什么时间来播放它们的信息在里面。音频流有采样，视频流有每秒的帧率。然而，如果我们只是简单的通过数帧和乘以帧率的方式来同步视频，那么就很有可能会失去同步。于是作为一种补充，在流中的包有种叫做DTS（解码时间戳）和PTS（显示时间戳）的机制。为了这两个参数，你需要了解电影存放的方式。像MPEG等格式，使用被叫做B帧（B表示双向bidrectional）的方式。另外两种帧被叫做I帧和P帧（I表示关键帧，P表示预测帧）。I帧包含了某个特定的完整图像。P帧依赖于前面的I帧和P帧并且使用比较或者差分的方式来编码。B帧与P帧有点类似，但是它是依赖于前面和后面的帧的信息的。这也就解释了为什么我们可能在调用avcodec_decode_video以后会得不到一帧图像。

所以对于一个电影，帧是这样来显示的：I B B P。现在我们需要在显示B帧之前知道P帧中的信息。因此，帧可能会按照这样的方式来存储：IPBB。这就是为什么我们会有一个解码时间戳和一个显示时间戳的原因。解码时间戳告诉我们什么时候需要解码，显示时间戳告诉我们什么时候需要显示。所以，在这种情况下，我们的流可以是这样的：

PTS: 1 4 2 3

DTS: 1 2 3 4

Stream: I P B B

通常PTS和DTS只有在流中有B帧的时候会不同。

我跟踪代码的结果是，并不是每个AVPacket都有确定的PTS。

当我们调用av_read_frame()得到一个包的时候，PTS和DTS的信息也会保存在包中。但是我们真正想要的PTS是我们刚刚解码出来的原始帧的PTS，这样我们才能知道什么时候来显示它。然而，我们从avcodec_decode_video()函数中得到的帧只是一个AVFrame，其中并没有包含有用的PTS值（注意：AVFrame并没有包含时间戳信息，但当我们等到帧的时候并不是我们想要的样子）。然而，ffmpeg重新排序包以便于被avcodec_decode_video()函数处理的包的DTS可以总是与其返回的PTS相同。但是，另外的一个警告是：我们也并不是总能得到这个信息。

不用担心，因为有另外一种办法可以找到帧的PTS，我们可以让程序自己来重新排序包。我们保存一帧的第一个包的PTS：这将作为整个这一帧的 PTS。我们可以通过函数avcodec_decode_video()来计算出哪个包是一帧的第一个包。怎样实现呢？任何时候当一个包开始一帧的时候，avcodec_decode_video()将调用一个函数来为一帧申请一个缓冲。当然，ffmpeg允许我们重新定义那个分配内存的函数。所以我们制作了一个新的函数来保存一个包的时间戳。

当然，尽管那样，我们可能还是得不到一个正确的时间戳。我们将在后面处理这个问题。

同步

现在，知道了什么时候来显示一个视频帧真好，但是我们怎样来实际操作呢？这里有个主意：当我们显示了一帧以后，我们计算出下一帧显示的时间。然后我们简单的设置一个新的定时器来。你可能会想，我们检查下一帧的PTS值而不是系统时钟来看超时是否会到。这种方式可以工作，但是有两种情况要处理。

首先，要知道下一个PTS是什么。现在我们能添加视频速率到我们的PTS中－－太对了！然而，有些电影需要帧重复。这意味着我们重复播放当前的帧。这将导致程序显示下一帧太快了。所以我们需要计算它们。

第二，正如程序现在这样，视频和音频播放很欢快，一点也不受同步的影响。如果一切都工作得很好的话，我们不必担心。但是，你的电脑并不是最好的，很多视频文件也不是完好的。所以，我们有三种选择：同步音频到视频，同步视频到音频，或者都同步到外部时钟（例如你的电脑时钟）。从现在开始，我们将同步视频到音频。

写代码：获得帧的时间戳

现在让我们到代码中来做这些事情。我们将需要为我们的大结构体添加一些成员，但是我们会根据需要来做。首先，让我们看一下视频线程。记住，在这里我们得到了解码线程输出到队列中的包。这里我们需要的是从avcodec_decode_video函数中得到帧的时间戳。我们讨论的第一种方式是从上次处理的包中得到DTS，这是很容易的：

double pts;

for(;;) {

if(packet_queue_get(&is->videoq, packet, 1) < 0) {

// means we quit getting packets

break;

}

pts = 0;

// Decode video frame

len1 = avcodec_decode_video(is->video_st->codec,

pFrame, &frameFinished,

packet->data, packet->size);

if(packet->dts != AV_NOPTS_VALUE) {

pts = packet->dts;

} else {

pts = 0;

}

pts *= av_q2d(is->video_st->time_base);//这里就是1/frame_rate这里是1/25

如果我们得不到PTS就把它设置为0。

好，那是很容易的。但是我们所说的如果包的DTS不能帮到我们，我们需要使用这一帧的第一个包的PTS。我们通过让ffmpeg使用我们自己的申请帧程序来实现。下面的是函数的格式：

int get_buffer(struct AVCodecContext *c, AVFrame *pic);

void release_buffer(struct AVCodecContext *c, AVFrame *pic);

申请函数没有告诉我们关于包的任何事情，所以我们要自己每次在得到一个包的时候把PTS保存到一个全局变量中去。我们自己以读到它。然后，我们把值保存到AVFrame结构体难理解的变量中去。所以一开始，这就是我们的函数：

uint64_t global_video_pkt_pts = AV_NOPTS_VALUE;

//这里的AV_NOPTS_VALUE相当于NULL，out_get_buffer和our_release_buffer是自己定义的，赋给AVCodecContext的get_buffer和release_buffer，这样，程序在执行ffmpeg的get_buffer和release_buffer时就会执行到用户自己定义的函数体中。

int our_get_buffer(struct AVCodecContext *c, AVFrame *pic) {

int ret = avcodec_default_get_buffer(c, pic);

uint64_t *pts = av_malloc(sizeof(uint64_t));

*pts = global_video_pkt_pts;

pic->opaque = pts;

return ret;

}

void our_release_buffer(struct AVCodecContext *c, AVFrame *pic) {

if(pic) av_freep(&pic->opaque);

avcodec_default_release_buffer(c, pic);

}

函数avcodec_default_get_buffer和avcodec_default_release_buffer是ffmpeg中默认的申请缓冲的函数。函数av_freep是一个内存管理函数，它不但把内存释放而且把指针设置为NULL。

现在到了我们流打开的函数（stream_component_open），我们添加这几行来告诉ffmpeg如何去做：

codecCtx->get_buffer = our_get_buffer;

codecCtx->release_buffer = our_release_buffer;

现在我们必需添加代码来保存PTS到全局变量中，然后在需要的时候来使用它。我们的代码现在看起来应该是这样子：

for(;;) {

if(packet_queue_get(&is->videoq, packet, 1) < 0) {

// means we quit getting packets

break;

}

pts = 0;

// Save global pts to be stored in pFrame in first call

global_video_pkt_pts = packet->pts;

// Decode video frame

len1 = avcodec_decode_video(is->video_st->codec, pFrame, &frameFinished,

packet->data, packet->size);

if(packet->dts == AV_NOPTS_VALUE

&& pFrame->opaque && *(uint64_t*)pFrame->opaque != AV_NOPTS_VALUE) {

pts = *(uint64_t *)pFrame->opaque;

} else if(packet->dts != AV_NOPTS_VALUE) {

pts = packet->dts;

} else {

pts = 0;

}

pts *= av_q2d(is->video_st->time_base);

技术提示：你可能已经注意到我们使用int64来表示PTS。这是因为PTS是以整型来保存的。这个值是一个时间戳相当于时间的度量，用来以流的 time_base为单位进行时间度量。例如，如果一个流是24帧每秒，值为42的PTS表示这一帧应该排在第42个帧的位置如果我们每秒有24帧（这里并不完全正确）。

我们可以通过除以帧率来把这个值转化为秒。流中的time_base值表示1/framerate（对于固定帧率来说），所以得到了以秒为单位的PTS，我们需要乘以time_base。

写代码：使用PTS来同步

现在我们得到了PTS。我们要注意前面讨论到的两个同步问题。我们将定义一个函数叫做synchronize_video，它可以更新同步的 PTS。这个函数也能最终处理我们得不到PTS的情况。同时我们要知道下一帧的时间以便于正确设置刷新速率。我们可以使用内部的反映当前视频已经播放时间的时钟 video_clock来完成这个功能。我们把这些值添加到大结构体中。

typedef struct VideoState {

double video_clock; ///

下面的是函数synchronize_video，它可以很好的自我注释：

double synchronize_video(VideoState *is, AVFrame *src_frame, double pts) {

double frame_delay;

if(pts != 0) {

is->video_clock = pts;

} else {

pts = is->video_clock;

}

frame_delay = av_q2d(is->video_st->codec->time_base);

frame_delay += src_frame->repeat_pict * (frame_delay * 0.5);

is->video_clock += frame_delay;

return pts;

}

你也会注意到我们也计算了重复的帧。

现在让我们得到正确的PTS并且使用queue_picture来队列化帧，添加一个新的时间戳参数pts：

// Did we get a video frame?

if(frameFinished) {

pts = synchronize_video(is, pFrame, pts);

if(queue_picture(is, pFrame, pts) < 0) {

break;

}

对于queue_picture来说唯一改变的事情就是我们把时间戳值pts保存到VideoPicture结构体中，我们必需添加一个时间戳变量到结构体中并且添加一行代码：

typedef struct VideoPicture {

...

double pts;

}

int queue_picture(VideoState *is, AVFrame *pFrame, double pts) {

... stuff ...

if(vp->bmp) {

... convert picture ...

vp->pts = pts;

... alert queue ...

}

现在我们的图像队列中的所有图像都有了正确的时间戳值，所以让我们看一下视频刷新函数。你会记得上次我们用80ms的刷新时间来欺骗它。那么，现在我们将会算出实际的值。

我们的策略是通过简单计算前一帧和现在这一帧的时间戳来预测出下一个时间戳的时间。同时，我们需要同步视频到音频。我们将设置一个音频时间 audio clock；一个内部值记录了我们正在播放的音频的位置。就像从任意的mp3播放器中读出来的数字一样。既然我们把视频同步到音频，视频线程使用这个值来算出是否太快还是太慢。

我们将在后面来实现这些代码；现在我们假设我们已经有一个可以给我们音频时间的函数get_audio_clock。一旦我们有了这个值，我们在音频和视频失去同步的时候应该做些什么呢？简单而有点笨的办法是试着用跳过正确帧或者其它的方式来解决。作为一种替代的手段，我们会调整下次刷新的值；如果时间戳太落后于音频时间，我们加倍计算延迟。如果时间戳太领先于音频时间，我们将尽可能快的刷新。既然我们有了调整过的时间和延迟，我们将把它和我们通过 frame_timer计算出来的时间进行比较。这个帧时间frame_timer将会统计出电影播放中所有的延时。换句话说，这个 frame_timer就是指我们什么时候来显示下一帧。我们简单的添加新的帧定时器延时，把它和电脑的系统时间进行比较，然后使用那个值来调度下一次刷新。这可能有点难以理解，所以请认真研究代码：

void video_refresh_timer(void *userdata) {

VideoState *is = (VideoState *)userdata;

VideoPicture *vp;

double actual_delay, delay, sync_threshold, ref_clock, diff;

if(is->video_st) {

if(is->pictq_size == 0) {

schedule_refresh(is, 1);

} else {

vp = &is->pictq[is->pictq_rindex];

delay = vp->pts - is->frame_last_pts;

if(delay <= 0 || delay >= 1.0) {

delay = is->frame_last_delay;

}

is->frame_last_delay = delay;

is->frame_last_pts = vp->pts;

ref_clock = get_audio_clock(is);

diff = vp->pts - ref_clock;

sync_threshold = (delay > AV_SYNC_THRESHOLD) ? delay : AV_SYNC_THRESHOLD;

if(fabs(diff) < AV_NOSYNC_THRESHOLD) {

if(diff <= -sync_threshold) {

delay = 0;

} else if(diff >= sync_threshold) {

delay = 2 * delay;

}

is->frame_timer += delay;

actual_delay = is->frame_timer - (av_gettime() / 1000000.0);

if(actual_delay < 0.010) {

actual_delay = 0.010;

}

schedule_refresh(is, (int)(actual_delay * 1000 + 0.5));

video_display(is);

if(++is->pictq_rindex == VIDEO_PICTURE_QUEUE_SIZE) {

is->pictq_rindex = 0;

}

SDL_LockMutex(is->pictq_mutex);

is->pictq_size--;

SDL_CondSignal(is->pictq_cond);

SDL_UnlockMutex(is->pictq_mutex);

}

} else {

schedule_refresh(is, 100);

}

is->frame_timer表示下一帧要刷新（播放）的时刻。
is->frame_timer = (double)av_gettime() / 1000000.0;记录播放的初始时刻。然后在每次播放之前首先计算要播放的那帧的时刻，算好了时间才好设定定时器进行刷新。
actual_delay = is->frame_timer - (av_gettime() / 1000000.0);表示具体设定需要延迟的时间（is->frame_timer是将要播放的时刻，av_gettime() / 1000000.0是当前的时刻，它们的差值就是实际要延迟的时间）。

/*********************************************************************

这里的is->frame_timer需要注意，它第一次赋值时是在stream_compoment_open中：

is->frame_timer = (double)av_gettime() / 1000000.0; 获得系统时间作为第一帧播放的初始时刻，之后每一帧延迟delay都被累加进来，因此is->frame_timer就是当前帧的播放时间。

is->frame_timer += delay;

首先程序将帧播放时间与音频时间比较：

diff = vp->pts - ref_clock;

再与系统时间比较：

is->frame_timer += delay;

actual_delay = is->frame_timer - (av_gettime() / 1000000.0);

我们在这里做了很多检查：首先，我们保证现在的时间戳和上一个时间戳之间的处以delay是有意义的。如果不是的话，我们就猜测着用上次的延迟。接着，我们有一个同步阈值，因为在同步的时候事情并不总是那么完美的。在ffplay中使用0.01作为它的值。我们也保证阈值不会比时间戳之间的间隔短。最后，我们把最小的刷新值设置为10毫秒。

（这句不知道应该放在哪里）事实上这里我们应该跳过这一帧，但是我们不想为此而烦恼。

我们给大结构体添加了很多的变量，所以不要忘记检查一下代码。同时也不要忘记在函数streame_component_open中初始化帧时间frame_timer和前面的帧延迟frame delay：

av_gettime()得到的时间是以徽秒为单位的，所以要除以1000000转换为S。

is->frame_timer = (double)av_gettime() / 1000000.0;

is->frame_last_delay = 40e-3;

同步：声音时钟

现在让我们看一下怎样来得到声音时钟。我们可以在声音解码函数audio_decode_frame中更新时钟时间。现在，请记住我们并不是每次调用这个函数的时候都在处理新的包，所以有我们要在两个地方更新时钟。第一个地方是我们得到新的包的时候：我们简单的设置声音时钟为这个包的时间戳。然后，如果一个包里有许多帧，我们通过样本数和采样率来计算，所以当我们得到包的时候：

if(pkt->pts != AV_NOPTS_VALUE) {
is->audio_clock = av_q2d(is->audio_st->time_base)*pkt->pts;
}

然后当我们处理这个包的时候：

pts = is->audio_clock;
*pts_ptr = pts;
n = 2 * is->audio_st->codec->channels;
is->audio_clock += (double)data_size /
(double)(n * is->audio_st->codec->sample_rate);
一点细节：临时函数被改成包含pts_ptr，所以要保证你已经改了那些。这时的pts_ptr是一个用来通知audio_callback函数当前声音包的时间戳的指针。这将在下次用来同步声音和视频。

现在我们可以最后来实现我们的get_audio_clock函数。它并不像得到is->audio_clock值那样简单。注意我们会在每次处理它的时候设置声音时间戳，但是如果你看了audio_callback函数，它花费了时间来把数据从声音包中移到我们的输出缓冲区中。这意味着我们声音时钟中记录的时间比实际的要早太多。所以我们必须要检查一下我们还有多少没有写入。下面是完整的代码：
double get_audio_clock(VideoState *is) {
double pts;
int hw_buf_size, bytes_per_sec, n;
pts = is->audio_clock;
hw_buf_size = is->audio_buf_size - is->audio_buf_index;
bytes_per_sec = 0;
n = is->audio_st->codec->channels * 2;
if(is->audio_st) {
bytes_per_sec = is->audio_st->codec->sample_rate * n;
}
if(bytes_per_sec) {
pts -= (double)hw_buf_size / bytes_per_sec;
}
return pts;
}