webrtc QOS方法十一(音视频同步AVSyn实现)

一、背景介绍

由于音视频处理的系统路径不同,并且音视频媒体流是分开以RTP over UDP报文形式传输,UDP报文对网络丢包延时敏感,若不进行特殊平滑处理,会导致实际播放时音视频的渲染相对延时与采集延时有偏差,这样就容易产生音视频不同步现象。

音视频同步的基本思想是,在接收端渲染前,对音视频分别进行不同长度的适当缓冲,尽量保证音视频渲染different time = 音视频采集different time。保证音视频同步。

从上图可以看出,要处理音视频同步,接收端要,需要做好三件事:获取接收端音视频数据绝对时间NTP差、获取发送端音视频采集绝对时间NTP差、计算音视频渲染缓冲时间delaytime。

二、实现原理

1)获取接收端音视频数据绝对时间差

接收端音视频数据绝对时间NTP记录比较简单,在每次收报的时候,实时更新该时间戳对应的系统NTP时间即可。

2)获取发送端音视频采集绝对时间差

因为很多场景下,音视频不是同时采集,有时先有音频后有视频,有时先有视频后有音频,所以采集绝对时间差,不一定是0。

另外音频打包时长比较固定,10ms、20ms、40ms等,但是视频打包时长不固定,之前有一篇博客也写了,摄像头采集帧率受光线影响明显,同一次通话中,视频帧率也可能一直变化。

所以音视频数据采集的NTP绝对时间差也不恒定,需要实时更新计算。在接收端计算发送端音视频采集时间差的整体思想是:通过音视频的RTP报文时间戳和发送端发送SR报文中的NTPtime计算。

1、在RTCP的SR报告中,有当前一帧媒体流采集时间和媒体时间戳的对应关系。

Sender SSRC:该SR报告统计的发送的媒体报文的同源标识符

Timestamp:是发送该SR报告当时的系统NTP时间

RTP timestamp:发送该SR报告的系统NTP时间对应的媒体时间戳

SR封装代码如下:

2、视频帧率虽然可能动态变化,但是通话过程中采样率不会变,也就是说单位时间戳差值代表的绝对时间差值不变,所以当连续收到同一个SSRC报文的两个SR报告,就能确定NTP time与时间戳的一次线程方程的系数,后续任意收到一个时间戳,都可以根据这个一次线性方程(Tntp = Trtp/slope + offset计算出音视频采集的NTP绝对时间。其中slope的物理意义是音视频采样率(视频固定采样率为90K,音频根据不同编解码参数,有8K、16K、48K等),offset是音视频时间戳的起始偏移(为防止网络报文异常攻击篡改,RTP报文时间戳都不是以0开始,系统启动的时候,会获取一个随机数,作为时间戳的起始值)。

功能具体实现在RtpToNtpEstimator类中实现。核心函数是RtpToNtpEstimator::UpdateMeasurements和RtpToNtpEstimator::Estimate。

RtpToNtpEstimator::UpdateMeasurements

通过push进来的SR报告,计算媒体的slope采样率和offset起始偏移。

RtpToNtpEstimator::UpdateMeasurements函数分两步,首先对进行合法性判断,判断合法后,调用RtpToNtpEstimator::UpdateParameters计算媒体的slope采样率和offset起始偏移。

void RtpToNtpEstimator::UpdateParameters() {
  size_t n = measurements_.size();
  if (n < 2)
    return;

  // Run linear regression:
  // Given x[] and y[] writes out such k and b that line y=k*x+b approximates
  // given points in the best way (Least Squares Method).
  auto x = [](const RtcpMeasurement& m) {
    return static_cast<double>(m.unwrapped_rtp_timestamp);
  };
  auto y = [](const RtcpMeasurement& m) {
    return static_cast<double>(static_cast<uint64_t>(m.ntp_time));
  };

  double avg_x = 0;
  double avg_y = 0;
  for (const RtcpMeasurement& m : measurements_) {
    avg_x += x(m);
    avg_y += y(m);
  }
  avg_x /= n;
  avg_y /= n;

  double variance_x = 0;
  double covariance_xy = 0;
  for (const RtcpMeasurement& m : measurements_) {
    double normalized_x = x(m) - avg_x;
    double normalized_y = y(m) - avg_y;
    variance_x += normalized_x * normalized_x;
    covariance_xy += normalized_x * normalized_y;
  }

  if (std::fabs(variance_x) < 1e-8)
    return;

  double k = covariance_xy / variance_x;
  double b = avg_y - k * avg_x;
  params_ = {{.slope = k, .offset = b}};
}

RtpToNtpEstimator::Estimate

根据UpdateParameters计算出来的媒体采样帧率和起始偏移,和新进入的媒体时间戳,计算新进入帧的视频采集绝对时间。

NtpTime RtpToNtpEstimator::Estimate(uint32_t rtp_timestamp) const {
  if (!params_)
    return NtpTime();

  double estimated =
      static_cast<double>(unwrapper_.Unwrap(rtp_timestamp)) * params_->slope +
      params_->offset + 0.5f;

  return NtpTime(rtc::saturated_cast<uint64_t>(estimated));
}

3)计算音视频渲染缓冲时间delaytime

核心函数是StreamSynchronization类里面的ComputeRelativeDelay和ComputeDelays。

StreamSynchronization::ComputeRelativeDelay

用来计算(音视频渲染different time-音视频采集different time)

bool StreamSynchronization::ComputeRelativeDelay(
    const Measurements& audio_measurement,
    const Measurements& video_measurement,
    int* relative_delay_ms) {
  NtpTime audio_last_capture_time =
      audio_measurement.rtp_to_ntp.Estimate(audio_measurement.latest_timestamp);
  if (!audio_last_capture_time.Valid()) {
    return false;
  }
  NtpTime video_last_capture_time =
      video_measurement.rtp_to_ntp.Estimate(video_measurement.latest_timestamp);
  if (!video_last_capture_time.Valid()) {
    return false;
  }
  int64_t audio_last_capture_time_ms = audio_last_capture_time.ToMs();
  int64_t video_last_capture_time_ms = video_last_capture_time.ToMs();

  // Positive diff means that video_measurement is behind audio_measurement.
  *relative_delay_ms =
      video_measurement.latest_receive_time_ms -
      audio_measurement.latest_receive_time_ms -
      (video_last_capture_time_ms - audio_last_capture_time_ms);

  if (*relative_delay_ms > kMaxDeltaDelayMs ||
      *relative_delay_ms < -kMaxDeltaDelayMs) {
    return false;
  }
  return true;
}

StreamSynchronization::ComputeDelays

用来计算音视频各需要的渲染缓冲延时时间

bool StreamSynchronization::ComputeDelays(int relative_delay_ms,
                                          int current_audio_delay_ms,
                                          int* total_audio_delay_target_ms,
                                          int* total_video_delay_target_ms) {
  int current_video_delay_ms = *total_video_delay_target_ms;

  RTC_LOG(LS_VERBOSE) << "Audio delay: " << current_audio_delay_ms
                      << " current diff: " << relative_delay_ms
                      << " for stream " << audio_stream_id_;

  // Calculate the difference between the lowest possible video delay and the
  // current audio delay.
  int current_diff_ms =
      current_video_delay_ms - current_audio_delay_ms + relative_delay_ms;

  avg_diff_ms_ =
      ((kFilterLength - 1) * avg_diff_ms_ + current_diff_ms) / kFilterLength;
  if (abs(avg_diff_ms_) < kMinDeltaMs) {
    // Don't adjust if the diff is within our margin.
    return false;
  }

  // Make sure we don't move too fast.
  int diff_ms = avg_diff_ms_ / 2;
  diff_ms = std::min(diff_ms, kMaxChangeMs);
  diff_ms = std::max(diff_ms, -kMaxChangeMs);

  // Reset the average after a move to prevent overshooting reaction.
  avg_diff_ms_ = 0;

  if (diff_ms > 0) {
    // The minimum video delay is longer than the current audio delay.
    // We need to decrease extra video delay, or add extra audio delay.
    if (video_delay_.extra_ms > base_target_delay_ms_) {
      // We have extra delay added to ViE. Reduce this delay before adding
      // extra delay to VoE.
      video_delay_.extra_ms -= diff_ms;
      audio_delay_.extra_ms = base_target_delay_ms_;
    } else {  // video_delay_.extra_ms > 0
      // We have no extra video delay to remove, increase the audio delay.
      audio_delay_.extra_ms += diff_ms;
      video_delay_.extra_ms = base_target_delay_ms_;
    }
  } else {  // if (diff_ms > 0)
    // The video delay is lower than the current audio delay.
    // We need to decrease extra audio delay, or add extra video delay.
    if (audio_delay_.extra_ms > base_target_delay_ms_) {
      // We have extra delay in VoiceEngine.
      // Start with decreasing the voice delay.
      // Note: diff_ms is negative; add the negative difference.
      audio_delay_.extra_ms += diff_ms;
      video_delay_.extra_ms = base_target_delay_ms_;
    } else {  // audio_delay_.extra_ms > base_target_delay_ms_
      // We have no extra delay in VoiceEngine, increase the video delay.
      // Note: diff_ms is negative; subtract the negative difference.
      video_delay_.extra_ms -= diff_ms;  // X - (-Y) = X + Y.
      audio_delay_.extra_ms = base_target_delay_ms_;
    }
  }

  // Make sure that video is never below our target.
  video_delay_.extra_ms =
      std::max(video_delay_.extra_ms, base_target_delay_ms_);

  int new_video_delay_ms;
  if (video_delay_.extra_ms > base_target_delay_ms_) {
    new_video_delay_ms = video_delay_.extra_ms;
  } else {
    // No change to the extra video delay. We are changing audio and we only
    // allow to change one at the time.
    new_video_delay_ms = video_delay_.last_ms;
  }

  // Make sure that we don't go below the extra video delay.
  new_video_delay_ms = std::max(new_video_delay_ms, video_delay_.extra_ms);

  // Verify we don't go above the maximum allowed video delay.
  new_video_delay_ms =
      std::min(new_video_delay_ms, base_target_delay_ms_ + kMaxDeltaDelayMs);

  int new_audio_delay_ms;
  if (audio_delay_.extra_ms > base_target_delay_ms_) {
    new_audio_delay_ms = audio_delay_.extra_ms;
  } else {
    // No change to the audio delay. We are changing video and we only allow to
    // change one at the time.
    new_audio_delay_ms = audio_delay_.last_ms;
  }

  // Make sure that we don't go below the extra audio delay.
  new_audio_delay_ms = std::max(new_audio_delay_ms, audio_delay_.extra_ms);

  // Verify we don't go above the maximum allowed audio delay.
  new_audio_delay_ms =
      std::min(new_audio_delay_ms, base_target_delay_ms_ + kMaxDeltaDelayMs);

  video_delay_.last_ms = new_video_delay_ms;
  audio_delay_.last_ms = new_audio_delay_ms;

  RTC_LOG(LS_VERBOSE) << "Sync video delay " << new_video_delay_ms
                      << " for video stream " << video_stream_id_
                      << " and audio delay " << audio_delay_.extra_ms
                      << " for audio stream " << audio_stream_id_;

  *total_video_delay_target_ms = new_video_delay_ms;
  *total_audio_delay_target_ms = new_audio_delay_ms;
  return true;
}

RtpStreamsSynchronizer::UpdateDelay

生效计算出来的音视频目标延时。

以视频调整延时参数为例:

syncable_video_->SetMinimumPlayoutDelay对应videoReceiveStream2::SetMinimumPlayoutDelay
->VideoReceiveStream2::UpdatePlayoutDelays
->VCMTiming::set_max_playout_delay

由于还要考虑网络抖动和延时,不能严格按照计算出来的同步时间配置,所以在计算渲染时间时,会控制视频渲染延时在max_playout_delay_以内。实现函数为VCMTiming::RenderTimeInternal

三、参考

WebRTC音视频同步机制实现分析 - 简书

WebRTC 音视频同步原理与实现 - OSCHINA - 中文开源技术交流社区

****************** - OSCHINA - 中文开源技术交流社区

https://zhuanlan.zhihu.com/p/346004563

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值