ffplay.c学习-7-以音频同步为基准

最新推荐文章于 2023-07-08 10:26:28 发布

Lumos`

最新推荐文章于 2023-07-08 10:26:28 发布

阅读量354

点赞数

分类专栏： FFmpeg

本文链接：https://blog.csdn.net/weixin_41910694/article/details/116899491

版权

FFmpeg 专栏收录该内容

25 篇文章 10 订阅

订阅专栏

ffplay.c学习-7-以音频同步为基准

	audio_callback_time = av_gettime_relative();
	...
    /* Let's assume the audio driver that is used by SDL has two periods. */
    if (!isnan(is->audio_clock)) {
        set_clock_at(&is->audclk, is->audio_clock -
                                  (double)(2 * is->audio_hw_buf_size + is->audio_write_buf_size)
                                  / is->audio_tgt.bytes_per_sec,
                     is->audio_clock_serial,
                     audio_callback_time / 1000000.0);
        sync_clock_to_slave(&is->extclk, &is->audclk);
    }

⾳频时钟的维护
我们先来is->audio_clock是在audio_decode_frame赋值：is->audio_clock = af->pts + (double) af->frame->nb_samples / af->frame->sample_rate;从这⾥可以看出来，这⾥的时间戳是audio_buf结束位置的时间戳，⽽不是audio_buf起始位置的时间戳，所以当audio_buf有剩余时（剩余的⻓度记录在audio_write_buf_size），那实际数据的pts就变成is->audio_clock - (double)(is->audio_write_buf_size) / is->audio_tgt.bytes_per_sec，即是
再考虑到，实质上audio_hw_buf_size*2这些数据实际都没有播放出去，所以就有is->audio_clock - (double)(2 * is->audio_hw_buf_size + is->audio_write_buf_size) / is->audio_tgt.bytes_per_sec。
再加上我们在SDL回调进⾏填充，实际上是有开始被播放，所以我们这⾥采⽤的相对时间是，刚回调产⽣的，就是内部在播放的时候，那相对时间实际也在⾛。

static void sdl_audio_callback(void *opaque, Uint8 *stream, int len)
{
    VideoState *is = opaque;
    int audio_size, len1;

    audio_callback_time = av_gettime_relative();

最终

set_clock_at(&is->audclk, is->audio_clock - (double)(2 * is->audio_hw_buf_size + is-
>audio_write_buf_size) / is->audio_tgt.bytes_per_sec, is->audio_clock_serial,audio_callback_time / 1000000.0);

2. 视频主流程

ffplay中将视频同步到⾳频的主要⽅案是，如果视频播放过快，则重复播放上⼀帧，以等待⾳频；如果视频播放过慢，则丢帧追赶⾳频。
这⼀部分的逻辑实现在视频输出函数 video_refresh 中，分析代码前，我们先来回顾下这个函数的流程图：
在这个流程中，“计算上⼀帧显示时⻓”这⼀步骤⾄关重要。先来看下代码：

/* called to display each frame */
/* 非暂停或强制刷新的时候，循环调用video_refresh */
static void video_refresh(void *opaque, double *remaining_time)
{
    		....
            /* compute nominal last_duration */
            //lastvp上一帧，vp当前帧 ，nextvp下一帧
            //last_duration 计算上一帧应显示的时长
            last_duration = vp_duration(is, lastvp, vp);

            // 经过compute_target_delay方法，计算出待显示帧vp需要等待的时间
            // 如果以video同步，则delay直接等于last_duration。
            // 如果以audio或外部时钟同步，则需要比对主时钟调整待显示帧vp要等待的时间。
            delay = compute_target_delay(last_duration, is);

            time= av_gettime_relative()/1000000.0;
            // is->frame_timer 实际上就是上一帧lastvp的播放时间,
            // is->frame_timer + delay 是待显示帧vp该播放的时间
            if (time < is->frame_timer + delay) { //判断是否继续显示上一帧
                // 当前系统时刻还未到达上一帧的结束时刻，那么还应该继续显示上一帧。
                // 计算出最小等待时间
                *remaining_time = FFMIN(is->frame_timer + delay - time, *remaining_time);
                goto display;
            }

            // 走到这一步，说明已经到了或过了该显示的时间，待显示帧vp的状态变更为当前要显示的帧

            is->frame_timer += delay;   // 更新当前帧播放的时间
            if (delay > 0 && time - is->frame_timer > AV_SYNC_THRESHOLD_MAX) {
                is->frame_timer = time; //如果和系统时间差距太大，就纠正为系统时间
            }
            SDL_LockMutex(is->pictq.mutex);
            if (!isnan(vp->pts))
                update_video_pts(is, vp->pts, vp->pos, vp->serial); // 更新video时钟
            SDL_UnlockMutex(is->pictq.mutex);
            //丢帧逻辑
            if (frame_queue_nb_remaining(&is->pictq) > 1) {//有nextvp才会检测是否该丢帧
                Frame *nextvp = frame_queue_peek_next(&is->pictq);
                duration = vp_duration(is, vp, nextvp);
                if(!is->step        // 非逐帧模式才检测是否需要丢帧 is->step==1 为逐帧播放
                   && (framedrop>0 ||      // cpu解帧过慢
                       (framedrop && get_master_sync_type(is) != AV_SYNC_VIDEO_MASTER)) // 非视频同步方式
                   && time > is->frame_timer + duration // 确实落后了一帧数据
                        ) {
                    printf("%s(%d) dif:%lfs, drop frame\n", __FUNCTION__, __LINE__,
                           (is->frame_timer + duration) - time);
                    is->frame_drops_late++;             // 统计丢帧情况
                    frame_queue_next(&is->pictq);       // 这里实现真正的丢帧
                    //(这里不能直接while丢帧，因为很可能audio clock重新对时了，这样delay值需要重新计算)
                    goto retry; //回到函数开始位置，继续重试
                }
            }
			...
           
}

这段代码的逻辑在上述流程图中有包含。主要思路就是⼀开始提到的：
1. 如果视频播放过快，则重复播放上⼀帧，以等待⾳频；
2. 如果视频播放过慢，则丢帧追赶⾳频。实现的⽅式是，参考audio clock，计算上⼀帧（在屏幕上的那个画⾯）还应显示多久（含帧本身时⻓），然后与系统时刻对⽐，是否该显示下⼀帧了。
这⾥与系统时刻的对⽐，引⼊了另⼀个概念——frame_timer。可以理解为帧显示时刻，如更新前，是上⼀帧lastvp的显示时刻；对于更新后（ is->frame_timer += delay ），则为当前帧vp显示时刻。上⼀帧显示时刻加上delay（还应显示多久（含帧本身时⻓））即为上⼀帧应结束显示的时刻。具体原理看如下示意图：
这⾥给出了3种情况的示意图：
1. time1：系统时刻⼩于lastvp结束显示的时刻（frame_timer+dealy），即虚线圆圈位置。此时应该继续显示lastvp
2. time2：系统时刻⼤于lastvp的结束显示时刻，但⼩于vp的结束显示时刻（vp的显示时间开始于虚线圆圈，结束于⿊⾊圆圈）。此时既不重复显示lastvp，也不丢弃vp，即应显示vp
3. time3：系统时刻⼤于vp结束显示时刻（⿊⾊圆圈位置，也是nextvp预计的开始显示时刻）。此时应该丢弃vp。

3. delay的计算

那么接下来就要看最关键的lastvp的显示时⻓delay（不是很好理解，要反复体会）是如何计算的。这在函数compute_target_delay中实现：

/**
 * @brief 计算正在显示帧需要持续播放的时间。
 * @param delay 该参数实际传递的是当前显示帧和待播放帧的间隔。
 * @param is
 * @return 返回当前显示帧要持续播放的时间。为什么要调整返回的delay？为什么不支持使用相邻间隔帧时间？
 */
static double compute_target_delay(double delay, VideoState *is)
{
    double sync_threshold, diff = 0;

    /* update delay to follow master synchronisation source */
    /* 如果发现当前主Clock源不是video，则计算当前视频时钟与主时钟的差值 */
    if (get_master_sync_type(is) != AV_SYNC_VIDEO_MASTER) {
        /* if video is slave, we try to correct big delays by
           duplicating or deleting a frame
           通过重复帧或者删除帧来纠正延迟*/
        diff = get_clock(&is->vidclk) - get_master_clock(is);

        /* skip or repeat frame. We take into account the
           delay to compute the threshold. I still don't know
           if it is the best guess */
        sync_threshold = FFMAX(AV_SYNC_THRESHOLD_MIN,
                               FFMIN(AV_SYNC_THRESHOLD_MAX, delay));
        if (!isnan(diff) && fabs(diff) < is->max_frame_duration) { // diff在最大帧duration内
            if (diff <= -sync_threshold) {      // 视频已经落后了
                delay = FFMAX(0, delay + diff); //
            }
            else if (diff >= sync_threshold && delay > AV_SYNC_FRAMEDUP_THRESHOLD) {
                // 视频超前
                //AV_SYNC_FRAMEDUP_THRESHOLD是0.1，此时如果delay>0.1, 如果2*delay时间就有点久
                delay = delay + diff;
            }
            else if (diff >= sync_threshold) {
                delay = 2 * delay; // 保持在 2 * AV_SYNC_FRAMEDUP_THRESHOLD内, 即是2*0.1 = 0.2秒内
            } else {
                // 其他条件就是delay = delay; 维持原来的delay, 依靠frame_timer+duration和当前时间进行对比
            }
        }
    } else {
        // 如果是以video为同步，则直接返回last_duration
    }

    av_log(NULL, AV_LOG_TRACE, "video: delay=%0.3f A-V=%f\n",
           delay, -diff);

    return delay;
}

这段代码中最难理解的是sync_threshold，sync_threshold值范围：FFMAX(AV_SYNC_THRESHOLD_MIN, FFMIN(AV_SYNC_THRESHOLD_MAX, delay))，其中delay为传⼊的上⼀帧播放需要持续的时间(本质是帧持续时间 frame duration)，即是分以下3种情况：
1. delay >AV_SYNC_THRESHOLD_MAX=0.1秒，则sync_threshold = 0.1秒
2. delay <AV_SYNC_THRESHOLD_MIN=0.04秒，则sync_threshold = 0.04秒
3. AV_SYNC_THRESHOLD_MIN = 0.0.4秒 <= delay <= AV_SYNC_THRESHOLD_MAX=0.1秒，则sync_threshold为delay本身。
从这⾥分析也可以看出来，sync_threshold 最⼤值为0.1秒，最⼩值为0.04秒。这⾥说明⼀个说明问题呢？
1. 同步精度最好的范围是：-0.0.4秒~+0.04秒；
2. 同步精度最差的范围是：-0.1秒~+0.1秒；
和具体视频的帧率有关系，delay帧间隔（frame duration）落在0.04~0.1秒时，则同步精度为正负1帧。
画个图帮助理解：
图中：
1. 坐标轴是diff值⼤⼩，diff为0表示video clock与audio clock完全相同，完美同步。
2. 坐标轴下⽅⾊块，表示要返回的值，⾊块值的delay指传⼊参数，结合上⼀节代码，即lastvp的显示时⻓（frame duration）。
  从图上可以看出来sync_threshold是建⽴⼀块区域，在这块区域内⽆需调整lastvp的显示时⻓，直接返回delay即可。也就是在这块区域内认为是准同步的（sync_threshold也是最⼤允许同步误差）。

1. 同步判断结果：

diff <= -sync_threshold：如果⼩于-sync_threshold，那就是视频播放较慢，需要适当丢帧。具体是返回⼀个最⼤为0的值。根据前⾯frame_timer的图，⾄少应更新画⾯为vp。
diff >= sync_threshold && delay > AV_SYNC_FRAMEDUP_THRESHOLD：如果不仅⼤于sync_threshold，⽽且超过了AV_SYNC_FRAMEDUP_THRESHOLD，那么返回delay+diff，由具体diff决定还要显示多久（这⾥不是很明⽩代码意图，按我理解，统⼀处理为返回2*delay，或者delay+diff即可，没有区分的必要）
1. 此逻辑帧间隔delay > AV_SYNC_FRAMEDUP_THRESHOLD =0.1秒，此时sync_threshold =0.1秒，那delay + diff > 0.1 + diff >= 0.1 + 0.1 = 0.2秒。
diff >= sync_threshold：如果⼤于sync_threshold，那么视频播放太快，需要适当重复显示lastvp。具体是返回2delay，也就是2倍的lastvp显示时⻓，也就是让lastvp再显示⼀帧。此逻辑⼀定是 delay <= 0.1时秒，2delay <= 0.2秒
-sync_threshold <diff < +sync_threshold：允许误差内，按frame duration去显示视频，即返回delay
⾄此，基本上分析完了视频同步⾳频的过程，简单总结下：
1. 基本策略是：如果视频播放过快，则重复播放上⼀帧，以等待⾳频；如果视频播放过慢，则丢帧追赶⾳频。
2. 这⼀策略的实现⽅式是：引⼊frame_timer概念，标记帧的显示时刻和应结束显示的时刻，再与系统时刻对⽐，决定重复还是丢帧。
3. lastvp的应结束显示的时刻，除了考虑这⼀帧本身的显示时⻓，还应考虑了video clock与audio clock的差值。
4. 并不是每时每刻都在同步，⽽是有⼀个“准同步”的差值区域。

2. 以视频为基准

媒体流⾥⾯只有视频成分，这个时候才会⽤以视频为基准。
在“视频同步⾳频”的策略中，我们是通过丢帧或重复显示的⽅法来达到追赶或等待⾳频时钟的⽬的，但在“⾳频同步视频”时，却不能这样简单处理。
在⾳频输出时，最⼩单位是“样本”。⾳频⼀般以数字采样值保存，⼀般常⽤的采样频率有44.1K，48K等，也就是每秒钟有44100或48000个样本。视频输出中与“样本”概念最为接近的画⾯帧，如⼀个24fps(frame per second)的视频，⼀秒钟有24个画⾯输出，这⾥的⼀个画⾯和⾳频中的⼀个样本是等效的。可以想⻅，如果对⾳频使⽤⼀样的丢帧（丢样本）和重复显示⽅案，是不科学的。（⾳频的连续性远⾼于视频，通过重复⼏百个样本或者丢弃⼏百个样本来达到同步，会在听觉有很明显的不连贯）
⾳频本质上来讲：就是做重采样补偿，⾳频慢了，重采样后的样本就⽐正常的减少，以赶紧播放下⼀帧；⾳频快了，重采样后的样本就⽐正常的增加，从⽽播放慢⼀些。

1. 视频主流程

video_refresh()-> update_video_pts() 按照着视频帧间隔去播放，并实时地重新矫正video时钟。重点主要在audio的播放。

2. ⾳频主流程

在分析具体的补偿⽅法的之前，先回顾下⾳频输出的流程。
⾳频输出的主要模型是：
在 audio_buf 缓冲不⾜时， audio_decode_frame 会从FrameQueue中取出数据放⼊ audio_buf .audio_decode_frame 函数有⾳视频同步相关的控制代码：

/**
 * Decode one audio frame and return its uncompressed size.
 *
 * The processed audio frame is decoded, converted if required, and
 * stored in is->audio_buf, with size in bytes given by the return
 * value.
 */
static int audio_decode_frame(VideoState *is)
{
    ...
    // 获取样本数校正值：若同步时钟是音频，则不调整样本数；否则根据同步需要调整样本数
    wanted_nb_samples = synchronize_audio(is, af->frame->nb_samples);
    // is->audio_tgt是SDL可接受的音频帧数，是audio_open()中取得的参数
    // 在audio_open()函数中又有"is->audio_src = is->audio_tgt""
    // 此处表示：如果frame中的音频参数 == is->audio_src == is->audio_tgt，
    // 那音频重采样的过程就免了(因此时is->swr_ctr是NULL)
    // 否则使用frame(源)和is->audio_tgt(目标)中的音频参数来设置is->swr_ctx，
    // 并使用frame中的音频参数来赋值is->audio_src
    if (af->frame->format           != is->audio_src.fmt            || // 采样格式
        dec_channel_layout      != is->audio_src.channel_layout || // 通道布局
        af->frame->sample_rate  != is->audio_src.freq           || // 采样率
        // 第4个条件, 要改变样本数量, 那就是需要初始化重采样
        (wanted_nb_samples      != af->frame->nb_samples && !is->swr_ctx) // samples不同且swr_ctx没有初始化
            ) {
        swr_free(&is->swr_ctx);
        is->swr_ctx = swr_alloc_set_opts(NULL,
                                         is->audio_tgt.channel_layout,  // 目标输出
                                         is->audio_tgt.fmt,
                                         is->audio_tgt.freq,
                                         dec_channel_layout,            // 数据源
                                         af->frame->format,
                                         af->frame->sample_rate,
                                         0, NULL);
        if (!is->swr_ctx || swr_init(is->swr_ctx) < 0) {
            av_log(NULL, AV_LOG_ERROR,
                   "Cannot create sample rate converter for conversion of %d Hz %s %d channels to %d Hz %s %d channels!\n",
                   af->frame->sample_rate, av_get_sample_fmt_name(af->frame->format), af->frame->channels,
                   is->audio_tgt.freq, av_get_sample_fmt_name(is->audio_tgt.fmt), is->audio_tgt.channels);
            swr_free(&is->swr_ctx);
            return -1;
        }
        is->audio_src.channel_layout = dec_channel_layout;
        is->audio_src.channels       = af->frame->channels;
        is->audio_src.freq = af->frame->sample_rate;
        is->audio_src.fmt = af->frame->format;
    }

    if (is->swr_ctx) {
        // 重采样输入参数1：输入音频样本数是af->frame->nb_samples
        // 重采样输入参数2：输入音频缓冲区
        const uint8_t **in = (const uint8_t **)af->frame->extended_data; // data[0] data[1]

        // 重采样输出参数1：输出音频缓冲区尺寸
        uint8_t **out = &is->audio_buf1; //真正分配缓存audio_buf1，指向是用audio_buf
        // 重采样输出参数2：输出音频缓冲区
        int out_count = (int64_t)wanted_nb_samples * is->audio_tgt.freq / af->frame->sample_rate
                        + 256;

        int out_size  = av_samples_get_buffer_size(NULL, is->audio_tgt.channels,
                                                   out_count, is->audio_tgt.fmt, 0);
        int len2;
        if (out_size < 0) {
            av_log(NULL, AV_LOG_ERROR, "av_samples_get_buffer_size() failed\n");
            return -1;
        }
        // 如果frame中的样本数经过校正，则条件成立
        if (wanted_nb_samples != af->frame->nb_samples) {
            int sample_delta = (wanted_nb_samples - af->frame->nb_samples) * is->audio_tgt.freq
                               / af->frame->sample_rate;
            int compensation_distance = wanted_nb_samples * is->audio_tgt.freq / af->frame->sample_rate;
            // swr_set_compensation
            if (swr_set_compensation(is->swr_ctx,
                                     sample_delta,
                                     compensation_distance) < 0) {
                av_log(NULL, AV_LOG_ERROR, "swr_set_compensation() failed\n");
                return -1;
            }
        }
        av_fast_malloc(&is->audio_buf1, &is->audio_buf1_size, out_size);
        if (!is->audio_buf1)
            return AVERROR(ENOMEM);
        // 音频重采样：返回值是重采样后得到的音频数据中单个声道的样本数
        len2 = swr_convert(is->swr_ctx, out, out_count, in, af->frame->nb_samples);
        if (len2 < 0) {
            av_log(NULL, AV_LOG_ERROR, "swr_convert() failed\n");
            return -1;
        }
        if (len2 == out_count) {
            av_log(NULL, AV_LOG_WARNING, "audio buffer is probably too small\n");
            if (swr_init(is->swr_ctx) < 0)
                swr_free(&is->swr_ctx);
        }
        // 重采样返回的一帧音频数据大小(以字节为单位)
        is->audio_buf = is->audio_buf1;
        resampled_data_size = len2 * is->audio_tgt.channels * av_get_bytes_per_sample(is->audio_tgt.fmt);
    } else {
        // 未经重采样，则将指针指向frame中的音频数据
        is->audio_buf = af->frame->data[0]; // s16交错模式data[0], fltp data[0] data[1]
        resampled_data_size = data_size;
    }

    audio_clock0 = is->audio_clock;
    /* update the audio clock with the pts */
    if (!isnan(af->pts))
        is->audio_clock = af->pts + (double) af->frame->nb_samples / af->frame->sample_rate;
    else
        is->audio_clock = NAN;
    is->audio_clock_serial = af->serial;
#ifdef DEBUG
    {
        static double last_clock;
        printf("audio: delay=%0.3f clock=%0.3f clock0=%0.3f\n",
               is->audio_clock - last_clock,
               is->audio_clock, audio_clock0);
        last_clock = is->audio_clock;
    }
#endif
    return resampled_data_size;
}

主要分3个步骤：
1. 根据与vidoe clock的差值，计算应该输出的样本数。由函数 synchronize_audio 完成：
  1. ⾳频慢了则样本数减少
  2. ⾳频快了则样本数增加
2. 判断是否需要重采样：如果要输出的样本数与frame的样本数不相等，也就是需要适当减少或增加样本。
3. 重采样——利⽤重采样库进⾏样本的插⼊或剔除
可以看到，与视频的处理略有不同，视频的同步控制主要体现在上⼀帧显示时⻓的控制，即对frame_timer的控制；⽽⾳频是直接体现在输出样本上的控制。
前⾯提到如果单纯判断某个时刻应该重复样本或丢弃样本，然后对输出⾳频进⾏修改，⼈⽿会很容易感知到这⼀不连贯，体验不好。
这⾥的处理⽅式是利⽤重采样库进⾏平滑地样本剔除或添加。即在获知要调整的⽬标样本数wanted_nb_samples 后，通过 swr_set_compensation 和 swr_convert 的函数组合完成”重采样“。
需要注意的是，因为增加或删除了样本，样本总数发⽣了变化，⽽采样率不变，那么假设原先1s的声⾳将被以⼤于1s或⼩于1s的时⻓进⾏播放，这会导致声⾳整体频率被拉低或拉⾼。直观感受，就是声⾳变粗或变尖了。ffplay也考虑到了这点影响，其做法是设定⼀个最⼤、最⼩调整范围，避免⼤幅度的⾳调变化。

3. synchronize_audio

在了解了整体流程后，就来看下关键函数： synchronize_audio
synchronize_audio 负责根据与video clock的差值计算出合适的⽬标样本数，通过样本数控制⾳频输出速度。
现在让我们看看当 N 组⾳频采样已经不同步的情况。⽽这些⾳频采样不同步的程度也有很⼤的不同，所以我们要取平均值来衡量每个采样的不同步情况。⽐如，第⼀次调⽤时显示我们不同步了 40ms，下⼀次是50ms，等等。但是我们不会采取简单的平均计算，因为最近的值⽐之前的值更重要也更有意义，这时候我们会使⽤⼀个⼩数系数 audio_diff_cum，并对不同步的延时求和：is->audio_diff_cum = diff + is->audio_diff_avg_coef * is->audio_diff_cum;。当我们找到平均差异值时，我们就简单的计算 avg_diff= is->audio_diff_cum * (1.0 - is->audio_diff_avg_coef);。我们代码如下：

/* return the wanted number of samples to get better sync if sync_type is video
 * or external master clock */
/**
 * @brief 如果sync_type是视频或外部主时钟，则返回所需样本数以获得更好的同步
 * @param is
 * @param nb_samples    正常播放的采样数量
 * @return
 */
static int synchronize_audio(VideoState *is, int nb_samples)
{
    int wanted_nb_samples = nb_samples;

    /* if not master, then we try to remove or add samples to correct the clock */
    if (get_master_sync_type(is) != AV_SYNC_AUDIO_MASTER) {
        double diff, avg_diff;
        int min_nb_samples, max_nb_samples;

        diff = get_clock(&is->audclk) - get_master_clock(is);

        if (!isnan(diff) && fabs(diff) < AV_NOSYNC_THRESHOLD) {
            // 误差在AV_NOSYNC_THRESHOLD 范围再来看看要不要调整
            is->audio_diff_cum = diff + is->audio_diff_avg_coef * is->audio_diff_cum;
            if (is->audio_diff_avg_count < AUDIO_DIFF_AVG_NB) {
                /* not enough measures to have a correct estimate */
                is->audio_diff_avg_count++; // 连续20次不同步才进行校正
            } else {
                /* estimate the A-V difference */
                avg_diff = is->audio_diff_cum * (1.0 - is->audio_diff_avg_coef);
//                avg_diff = diff;
                if (fabs(avg_diff) >= is->audio_diff_threshold) {
                    wanted_nb_samples = nb_samples + (int)(diff * is->audio_src.freq);
                    min_nb_samples = ((nb_samples * (100 - SAMPLE_CORRECTION_PERCENT_MAX) / 100));
                    max_nb_samples = ((nb_samples * (100 + SAMPLE_CORRECTION_PERCENT_MAX) / 100));
                    // av_clip 用来限制wanted_nb_samples最终落在 min_nb_samples或者max_nb_samples
                    wanted_nb_samples = av_clip(wanted_nb_samples, min_nb_samples, max_nb_samples);
                }
                av_log(NULL, AV_LOG_INFO, "diff=%f adiff=%f sample_diff=%d apts=%0.3f %f\n",
                       diff, avg_diff, wanted_nb_samples - nb_samples,
                       is->audio_clock, is->audio_diff_threshold);
            }
        } else {
            // > AV_NOSYNC_THRESHOLD 阈值，该干嘛就干嘛
            /* too big difference : may be initial PTS errors, so
               reset A-V filter */
            is->audio_diff_avg_count = 0;
            is->audio_diff_cum       = 0;   // 恢复正常后重置为0
        }
    }

    return wanted_nb_samples;
}

和 compute_target_delay ⼀样，这个函数的源码注释也是ffplay⾥算多的。这⾥⾸先得先理解⼀个”神奇的算法“。
这⾥有⼀组变量 audio_diff_avg_coef 、audio_diff_avg_count 、 audio_diff_cum 、 avg_diff .我们会发现在开始播放的AUDIO_DIFF_AVG_NB（20）个帧内，都是在通过公式 is->audio_diff_cum = diff + is->audio_diff_avg_coef * is->audio_diff_cum; 计算累加值 audio_diff_cum 。按注释的意思是为了得到⼀个准确的估计值。接着在后⾯计算与主时钟的差值时，并不是直接求当前时刻的差值，⽽是根据累加值计算⼀个平均值： avg_diff = is->audio_diff_cum * (1.0 - is->audio_diff_avg_coef); ，然后通过这个均值进⾏校正。
这个公式的⽬的应该是为了让越靠近当前时刻的diff值在平均值中的权重越⼤
继续看在计算得到 avg_diff 后，如何确定要输出的样本数：

                    wanted_nb_samples = nb_samples + (int)(diff * is->audio_src.freq);
                    min_nb_samples = ((nb_samples * (100 - SAMPLE_CORRECTION_PERCENT_MAX) / 100));
                    max_nb_samples = ((nb_samples * (100 + SAMPLE_CORRECTION_PERCENT_MAX) / 100));
                    // av_clip 用来限制wanted_nb_samples最终落在 min_nb_samples或者max_nb_samples
                    wanted_nb_samples = av_clip(wanted_nb_samples, min_nb_samples, max_nb_samples);

时间差值乘以采样率可以得到⽤于补偿的样本数，加之原样本数，即应输出样本数。另外考虑到上⼀节提到的⾳频⾳调变化问题，这⾥限制了调节范围在正负10%以内。
所以如果⾳视频不同步的差值较⼤，并不会⽴即完全同步，最多只调节当前帧样本数的10%，剩余会在下次调节时继续校正。
最后，是与视频同步⾳频时类似地，有⼀个准同步的区间，在这个区间内不去做同步校正，其⼤⼩是audio_diff_threshold：

	is->audio_diff_threshold = (double)(is->audio_hw_buf_size) / is->audio_tgt.bytes_per_sec;

即⾳频输出设备内缓冲的⾳频时⻓。
以上，就是⾳频去同步视频时的主要逻辑。简单总结如下：
1. ⾳频追赶、等待视频采样的⽅法是直接调整输出样本数量
2. 调整输出样本时为避免听觉上不连贯的体验，使⽤了重采样库进⾏⾳频的剔除和添加
3. 计算校正后输出的样本数量，使⽤了⼀个”神奇的公式“

4. swr_set_compensation

/**
 * @}
 *
 * @name Low-level option setting functions
 * These functons provide a means to set low-level options that is not possible
 * with the AVOption API.
 * @{
 */

/**
 * Activate resampling compensation ("soft" compensation). This function is
 * internally called when needed in swr_next_pts().
 *
 * @param[in,out] s             allocated Swr context. If it is not initialized,
 *                              or SWR_FLAG_RESAMPLE is not set, swr_init() is
 *                              called with the flag set.
 * @param[in]     sample_delta  delta in PTS per sample
 * @param[in]     compensation_distance number of samples to compensate for
 * @return    >= 0 on success, AVERROR error codes if:
 *            @li @c s is NULL,
 *            @li @c compensation_distance is less than 0,
 *            @li @c compensation_distance is 0 but sample_delta is not,
 *            @li compensation unsupported by resampler, or
 *            @li swr_init() fails when called.
 */
int swr_set_compensation(struct SwrContext *s, int sample_delta, int compensation_distance);

激活重采样补偿（“软”补偿）。
在swr_next_pts（）中需要时，内部调⽤此函数。
参数：s：分配Swr上下⽂。如果未初始化，或未设置SWR_FLAG_RESAMPLE，则会使⽤标志集调⽤swr_init（）。
sample_delta：每个样本PTS的delta
compensation_distance：要补偿的样品数量
返回：> = 0成功，AVERROR错误代码如果：
1. s为null
2. compensation_distance⼩于0，
3. compensation_distance是0，但是sample_delta不是，
4. 补偿不⽀持重采样器，或
5. 调⽤时，swr_init（）失败。

Lumos`

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
ffplay.c学习-7-以音频同步为基准

ffplay.c学习-7-以音频同步为基准目录⾳频主流程视频主流程delay的计算1. ⾳频主流程ffplay默认也是采⽤的这种同步策略。此时⾳频的时钟设置在sdl_audio_callback： audio_callback_time = av_gettime_relative(); ... /* Let's assume the audio driver that is used by SDL has two periods. */ if (!isnan(i
复制链接

扫一扫