Synching Video

转载 2007年09月15日 14:11:00

Tutorial 05: Synching Video

Code: tutorial05.c

How Video Syncs

So this whole time, we've had an essentially useless movie player. It plays the video, yeah, and it plays the audio, yeah, but it's not quite yet what we would call a movie. So what do we do?

PTS and DTS

Fortunately, both the audio and video streams have the information about how fast and when you are supposed to play them inside of them. Audio streams have a sample rate, and the video streams have a frames per second value. However, if we simply synced the video by just counting frames and multiplying by frame rate, there is a chance that it will go out of sync with the audio. Instead, packets from the stream might have what is called a decoding time stamp (DTS) and a presentation time stamp (PTS). To understand these two values, you need to know about the way movies are stored. Some formats, like MPEG, use what they call "B" frames (B stands for "bidirectional"). The two other kinds of frames are called "I" frames and "P" frames ("I" for "intra" and "P" for "predicted"). I frames contain a full image. P frames depend upon previous I and P frames and are like diffs or deltas. B frames are the same as P frames, but depend upon information found in frames that are displayed both before and after them! This explains why we might not have a finished frame after we call avcodec_decode_video.

So let's say we had a movie, and the frames were displayed like: I B B P. Now, we need to know the information in P before we can display either B frame. Because of this, the frames might be stored like this: I P B B. This is why we have a decoding timestamp and a presentation timestamp on each frame. The decoding timestamp tells us when we need to decode something, and the presentation time stamp tells us when we need to display something. So, in this case, our stream might look like this:

   PTS: 1 4 2 3
   DTS: 1 2 3 4
Stream: I P B B
Generally the PTS and DTS will only differ when the stream we are playing has B frames in it.

 

When we get a packet from av_read_frame(), it will contain the PTS and DTS values for the information inside that packet. But what we really want is the PTS of our newly decoded raw frame, so we know when to display it. However, the frame we get from avcodec_decode_video() gives us an AVFrame, which doesn't contain a useful PTS value. (Warning: AVFrame does contain a pts variable, but this will not always contain what we want when we get a frame.) However, ffmpeg reorders the packets so that the DTS of the packet being processed by avcodec_decode_video() will always be the same as the PTS of the frame it returns. But, another warning: we won't always get this information, either.

Not to worry, because there's another way to find out the PTS of a frame, and we can have our program reorder the packets by itself. We save the PTS of the first packet of a frame: this will be the PTS of the finished frame. So when the stream doesn't give us a DTS, we just use this saved PTS. We can figure out which packet is the first packet of a frame by letting avcodec_decode_video() tell us. How? Whenever a packet starts a frame, the avcodec_decode_video() will call a function to allocate a buffer in our frame. And of course, ffmpeg allows us to redefine what that allocation function is. So we'll make a new function that saves the pts of the packet.

Of course, even then, we might not get a proper pts. We'll deal with that later.

Synching

Now, while it's all well and good to know when we're supposed to show a particular video frame, but how do we actually do so? Here's the idea: after we show a frame, we figure out when the next frame should be shown. Then we simply set a new timeout to refresh the video again after that amount of time. As you might expect, we check the value of the PTS of the next frame against the system clock to see how long our timeout should be. This approach works, but there are two issues that need to be dealt with.

First is the issue of knowing when the next PTS will be. Now, you might think that we can just add the video rate to the current PTS — and you'd be mostly right. However, some kinds of video call for frames to be repeated. This means that we're supposed to repeat the current frame a certain number of times. This could cause the program to display the next frame too soon. So we need to account for that.

The second issue is that as the program stands now, the video and the audio chugging away happily, not bothering to sync at all. We wouldn't have to worry about that if everything worked perfectly. But your computer isn't perfect, and a lot of video files aren't, either. So we have three choices: sync the audio to the video, sync the video to the audio, or sync both to an external clock (like your computer). For now, we're going to sync the video to the audio.

Coding it: getting the frame PTS

Now let's get into the code to do all this. We're going to need to add some more members to our big struct, but we'll do this as we need to. First let's look at our video thread. Remember, this is where we pick up the packets that were put on the queue by our decode thread. What we need to do in this part of the code is get the PTS of the frame given to us by avcodec_decode_video. The first way we talked about was getting the DTS of the last packet processed, which is pretty easy:

  double pts;

  for(;;) {
    if(packet_queue_get(&is->videoq, packet, 1) < 0) {
      // means we quit getting packets
      break;
    }
    pts = 0;
    // Decode video frame
    len1 = avcodec_decode_video(is->video_st->codec, 
                                pFrame, &frameFinished, 
				packet->data, packet->size);
    if(packet->dts != AV_NOPTS_VALUE) {
      pts = packet->dts;
    } else {
      pts = 0;
    }
    pts *= av_q2d(is->video_st->time_base);
We set the PTS to 0 if we can't figure out what it is.

 

Well, that was easy. But we said before that if the packet's DTS doesn't help us, we need to use the PTS of the first packet that was decoded for the frame. We do this by telling ffmpeg to use our custom functions to allocate a frame. Here's the types of the functions:

int get_buffer(struct AVCodecContext *c, AVFrame *pic);
void release_buffer(struct AVCodecContext *c, AVFrame *pic);
The get function doesn't tell us anything about packets, so what we'll do is save the PTS to a global variable each time we get a packet, and our get function can just read that. Then, we can store the value in the AVFrame structure's opaque variable. This is a user-defined variable, so we can use it for whatever we want. So first, here are our functions:
uint64_t global_video_pkt_pts = AV_NOPTS_VALUE;

/* These are called whenever we allocate a frame
 * buffer. We use this to store the global_pts in
 * a frame at the time it is allocated.
 */
int our_get_buffer(struct AVCodecContext *c, AVFrame *pic) {
  int ret = avcodec_default_get_buffer(c, pic);
  uint64_t *pts = av_malloc(sizeof(uint64_t));
  *pts = global_video_pkt_pts;
  pic->opaque = pts;
  return ret;
}
void our_release_buffer(struct AVCodecContext *c, AVFrame *pic) {
  if(pic) av_freep(&pic->opaque);
  avcodec_default_release_buffer(c, pic);
}
avcodec_default_get_buffer and avcodec_default_release_buffer are the default functions that ffmpeg uses for allocation. av_freep is a memory management function that not only frees the memory pointed to, but sets the pointer to NULL.

 

Now, going way down to our stream open function (stream_component_open), we add these lines to tell ffmpeg what to do:

    codecCtx->get_buffer = our_get_buffer;
    codecCtx->release_buffer = our_release_buffer;
Now we have to add code to save the PTS to the global variable and then use the stored PTS when necessary. Our code now looks like this:
  for(;;) {
    if(packet_queue_get(&is->videoq, packet, 1) < 0) {
      // means we quit getting packets
      break;
    }
    pts = 0;

    // Save global pts to be stored in pFrame in first call
    global_video_pkt_pts = packet->pts;
    // Decode video frame
    len1 = avcodec_decode_video(is->video_st->codec, pFrame, &frameFinished, 
				packet->data, packet->size);
    if(packet->dts == AV_NOPTS_VALUE 
       && pFrame->opaque && *(uint64_t*)pFrame->opaque != AV_NOPTS_VALUE) {
      pts = *(uint64_t *)pFrame->opaque;
    } else if(packet->dts != AV_NOPTS_VALUE) {
      pts = packet->dts;
    } else {
      pts = 0;
    }
    pts *= av_q2d(is->video_st->time_base);
A technical note: You may have noticed we're using int64 for the PTS. This is because the PTS is stored as an integer. This value is a timestamp that corresponds to a measurement of time in that stream's time_base unit. For example, if a stream has 24 frames per second, a PTS of 42 is going to indicate that the frame should go where the 42nd frame would be if there we had a frame every 1/24 of a second (certainly not necessarily true).

 

We can convert this value to seconds by dividing by the framerate. The time_base value of the stream is going to be 1/framerate (for fixed-fps content), so to get the PTS in seconds, we multiply by the time_base.

Coding: Synching and using the PTS

So now we've got our PTS all set. Now we've got to take care of the two synchronization problems we talked about above. We're going to define a function called synchronize_video that will update the PTS to be in sync with everything. This function will also finally deal with cases where we don't get a PTS value for our frame. At the same time we need to keep track of when the next frame is expected so we can set our refresh rate properly. We can accomplish this by using an internal video_clock value which keeps track of how much time has passed according to the video. We add this value to our big struct.

typedef struct VideoState {
  double          video_clock; ///
Here's the synchronize_video function, which is pretty self-explanatory:
double synchronize_video(VideoState *is, AVFrame *src_frame, double pts) {

  double frame_delay;

  if(pts != 0) {
    /* if we have pts, set video clock to it */
    is->video_clock = pts;
  } else {
    /* if we aren't given a pts, set it to the clock */
    pts = is->video_clock;
  }
  /* update the video clock */
  frame_delay = av_q2d(is->video_st->codec->time_base);
  /* if we are repeating a frame, adjust clock accordingly */
  frame_delay += src_frame->repeat_pict * (frame_delay * 0.5);
  is->video_clock += frame_delay;
  return pts;
}
You'll notice we account for repeated frames in this function, too.

 

Now let's get our proper PTS and queue up the frame using queue_picture, adding a new pts argument:

    // Did we get a video frame?
    if(frameFinished) {
      pts = synchronize_video(is, pFrame, pts);
      if(queue_picture(is, pFrame, pts) < 0) {
	break;
      }
    }
The only thing that changes about queue_picture is that we save that pts value to the VideoPicture structure that we queue up. So we have to add a pts variable to the struct and add a line of code:
typedef struct VideoPicture {
  ...
  double pts;
}
int queue_picture(VideoState *is, AVFrame *pFrame, double pts) {
  ... stuff ...
  if(vp->bmp) {
    ... convert picture ...
    vp->pts = pts;
    ... alert queue ...
  }
So now we've got pictures lining up onto our picture queue with proper PTS values, so let's take a look at our video refreshing function. You may recall from last time that we just faked it and put a refresh of 80ms. Well, now we're going to find out how to actually figure it out.

 

Our strategy is going to be to predict the time of the next PTS by simply measuring the time between the previous pts and this one. At the same time, we need to sync the video to the audio. We're going to make an audio clock: an internal value thatkeeps track of what position the audio we're playing is at. It's like the digital readout on any mp3 player. Since we're synching the video to the audio, the video thread uses this value to figure out if it's too far ahead or too far behind.

We'll get to the implementation later; for now let's assume we have a get_audio_clock function that will give us the time on the audio clock. Once we have that value, though, what do we do if the video and audio are out of sync? It would silly to simply try and leap to the correct packet through seeking or something. Instead, we're just going to adjust the value we've calculated for the next refresh: if the PTS is too far behind the audio time, we double our calculated delay. if the PTS is too far ahead of the audio time, we simply refresh as quickly as possible. Now that we have our adjusted refresh time, or delay, we're going to compare that with our computer's clock by keeping a running frame_timer. This frame timer will sum up all of our calculated delays while playing the movie. In other words, this frame_timer is what time it should be when we display the next frame. We simply add the new delay to the frame timer, compare it to the time on our computer's clock, and use that value to schedule the next refresh. This might be a bit confusing, so study the code carefully:

void video_refresh_timer(void *userdata) {

  VideoState *is = (VideoState *)userdata;
  VideoPicture *vp;
  double actual_delay, delay, sync_threshold, ref_clock, diff;
  
  if(is->video_st) {
    if(is->pictq_size == 0) {
      schedule_refresh(is, 1);
    } else {
      vp = &is->pictq[is->pictq_rindex];

      delay = vp->pts - is->frame_last_pts; /* the pts from last time */
      if(delay <= 0 || delay >= 1.0) {
	/* if incorrect delay, use previous one */
	delay = is->frame_last_delay;
      }
      /* save for next time */
      is->frame_last_delay = delay;
      is->frame_last_pts = vp->pts;

      /* update delay to sync to audio */
      ref_clock = get_audio_clock(is);
      diff = vp->pts - ref_clock;

      /* Skip or repeat the frame. Take delay into account
	 FFPlay still doesn't "know if this is the best guess." */
      sync_threshold = (delay > AV_SYNC_THRESHOLD) ? delay : AV_SYNC_THRESHOLD;
      if(fabs(diff) < AV_NOSYNC_THRESHOLD) {
	if(diff <= -sync_threshold) {
	  delay = 0;
	} else if(diff >= sync_threshold) {
	  delay = 2 * delay;
	}
      }
      is->frame_timer += delay;
      /* computer the REAL delay */
      actual_delay = is->frame_timer - (av_gettime() / 1000000.0);
      if(actual_delay < 0.010) {
	/* Really it should skip the picture instead */
	actual_delay = 0.010;
      }
      schedule_refresh(is, (int)(actual_delay * 1000 + 0.5));
      /* show the picture! */
      video_display(is);
      
      /* update queue for next picture! */
      if(++is->pictq_rindex == VIDEO_PICTURE_QUEUE_SIZE) {
	is->pictq_rindex = 0;
      }
      SDL_LockMutex(is->pictq_mutex);
      is->pictq_size--;
      SDL_CondSignal(is->pictq_cond);
      SDL_UnlockMutex(is->pictq_mutex);
    }
  } else {
    schedule_refresh(is, 100);
  }
}
There are a few checks we make: first, we make sure that the delay between the PTS and the previous PTS make sense. If it doesn't we just guess and use the last delay. Next, we make sure we have a synch threshold because things are never going to be perfectly in synch. ffplay uses 0.01 for its value. We also make sure that the synch threshold is never smaller than the gaps in between PTS values. Finally, we make the minimum refresh value 10 milliseconds*.* Really here we should skip the frame, but we're not going to bother.

 

We added a bunch of variables to the big struct so don't forget to check the code. Also, don't forget to initialize the frame timer and the initial previous frame delay in stream_component_open:

    is->frame_timer = (double)av_gettime() / 1000000.0;
    is->frame_last_delay = 40e-3;

 

Synching: The Audio Clock

Now it's time for us to implement the audio clock. We can update the clock time in our audio_decode_frame function, which is where we decode the audio. Now, remember that we don't always process a new packet every time we call this function, so there are two places we have to update the clock at. The first place is where we get the new packet: we simply set the audio clock to the packet's PTS. Then if a packet has multiple frames, we keep time the audio play by counting the number of samples and multiplying them by the given samples-per-second rate. So once we have the packet:

    /* if update, update the audio clock w/pts */
    if(pkt->pts != AV_NOPTS_VALUE) {
      is->audio_clock = av_q2d(is->audio_st->time_base)*pkt->pts;
    }
And once we are processing the packet:
      /* Keep audio_clock up-to-date */
      pts = is->audio_clock;
      *pts_ptr = pts;
      n = 2 * is->audio_st->codec->channels;
      is->audio_clock += (double)data_size /
	(double)(n * is->audio_st->codec->sample_rate);
A few fine details: the template of the function has changed to include pts_ptr, so make sure you change that. pts_ptr is a pointer we use to inform audio_callback the pts of the audio packet. This will be used next time for synchronizing the audio with the video.

 

Now we can finally implement our get_audio_clock function. It's not as simple as getting the is->audio_clock value, thought. Notice that we set the audio PTS every time we process it, but if you look at the audio_callback function, it takes time to move all the data from our audio packet into our output buffer. That means that the value in our audio clock could be too far ahead. So we have to check how much we have left to write. Here's the complete code:

double get_audio_clock(VideoState *is) {
  double pts;
  int hw_buf_size, bytes_per_sec, n;
  
  pts = is->audio_clock; /* maintained in the audio thread */
  hw_buf_size = is->audio_buf_size - is->audio_buf_index;
  bytes_per_sec = 0;
  n = is->audio_st->codec->channels * 2;
  if(is->audio_st) {
    bytes_per_sec = is->audio_st->codec->sample_rate * n;
  }
  if(bytes_per_sec) {
    pts -= (double)hw_buf_size / bytes_per_sec;
  }
  return pts;
}
You should be able to tell why this function works by now ;)

 

So that's it! Go ahead and compile it:

gcc -o tutorial05 tutorial05.c -lavutil -lavformat -lavcodec -lz -lm`sdl-config --cflags --libs`
and finally! you can watch a movie on your own movie player. Next time we'll look at audio synching, and then the tutorial after that we'll talk about seeking.

 

 >> Synching Audio

 

 

转自http://www.dranger.com/ffmpeg/tutorial05.html

An ffmpeg and SDL Tutorial Tutorial 05: Synching Video

Tutorial 05: Synching VideoCode: tutorial05.cHow Video SyncsSo this whole time, we've had an essenti...

H5 video 标签 播放事件 视频加载完成事件 获取视频播放进度时间

html 代码                                                                                           ...

Chromium为视频标签<video>创建播放器的过程分析

Chromium是通过WebKit解析网页内容的。当WebKit遇到标签时,就会创建一个播放器实例。WebKit是平台无关的,而播放器实现是平台相关的。因此,WebKit并没有自己实现播放器,而仅仅是...

深入理解l内核v4l2框架之video for linux 2(二)

3、video_device struct video_device{ #if defined(CONFIG_MEDIA_CONTROLLER) struct media_entity enti...

OpenCV 视频监控(Video Surveilance)的算法体系

如前面说到的,OpenCV VS提供了6组算法的接口,分别是:前景检测、新目标检测、目标跟踪、轨迹生成、跟踪后处理、轨迹分析,除了轨迹生成用于轨迹数据的保存以外,其他5个部分都是标准的视频监控算法体系...

自然图像抠图/视频抠像技术发展情况梳理(image matting, alpha matting, video matting)--计算机视觉专题1

自然图像抠图/视频抠像技术发展情况梳理 Sason@CSDN 持续更新. 当前更新日期2013.03.05, 添加Fast Mating、Global Matting、视频扣像。 当前更新日期2...

avcodec_decode_video2()解码视频后丢帧的问题解决

使用libav转码视频时发现一个问题:使用下面这段代码解码视频时,视频尾巴上会丢掉几帧。 [cpp] view plaincopy while(av...

编解码学习笔记(十一):Flash Video系列

用于在 Flash 中压缩视频。FLV流媒体格式是一种新的视频格式,它的出现有效地解决了视频文件导入Flash后,使导出的SWF文件体积庞大,不能在网络上有效使用等 缺点。一般FLV文件包在SWF P...

微信小程序开发之视频播放器 Video 弹幕 弹幕颜色自定义

把录音的模块尝试过之后就想着微信小程序的视频播放会不会更有趣? 果然,微信小程序视频自带弹幕.是不是很爽,跟我一起来看看. 微信小程序开发之录音机 音频播放 动画 (真机可用) 先上gif: ...

HTML5移动开发之路(11)——Video标签详解

在前面的小强的HTML5移动开发之路(5)——制作一个漂亮的视频播放器中制作了一个非常好用的播放器,有的朋友对其中的原理还不是很了解,这一篇文章将在前一篇的基础上深入剖析标签的使用。 一、使用技...
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:Synching Video
举报原因:
原因补充:

(最多只允许输入30个字)