webrtc QOS方法十一（音视频同步AVSyn实现）

CrystalShaw

已于 2024-07-26 14:11:35 修改

阅读量2k

点赞数 2

分类专栏： WebRTC视频QoS方法汇总文章标签： webrtc 音视频

于 2021-03-16 17:08:41 首次发布

本文链接：https://blog.csdn.net/CrystalShaw/article/details/114820274

版权

WebRTC视频QoS方法汇总专栏收录该内容

29 篇文章 195 订阅

订阅专栏

一、背景介绍

由于音视频处理的系统路径不同，并且音视频媒体流是分开以RTP over UDP报文形式传输，UDP报文对网络丢包延时敏感，若不进行特殊平滑处理，会导致实际播放时音视频的渲染相对延时与采集延时有偏差，这样就容易产生音视频不同步现象。

音视频同步的基本思想是，在接收端渲染前，对音视频分别进行不同长度的适当缓冲，尽量保证音视频渲染different time = 音视频采集different time。保证音视频同步。

从上图可以看出，要处理音视频同步，接收端要，需要做好三件事：获取接收端音视频数据绝对时间NTP差、获取发送端音视频采集绝对时间NTP差、计算音视频渲染缓冲时间delaytime。

二、实现原理

1）获取接收端音视频数据绝对时间差

接收端音视频数据绝对时间NTP记录比较简单，在每次收报的时候，实时更新该时间戳对应的系统NTP时间即可。

2）获取发送端音视频采集绝对时间差

因为很多场景下，音视频不是同时采集，有时先有音频后有视频，有时先有视频后有音频，所以采集绝对时间差，不一定是0。

另外音频打包时长比较固定，10ms、20ms、40ms等，但是视频打包时长不固定，之前有一篇博客也写了，摄像头采集帧率受光线影响明显，同一次通话中，视频帧率也可能一直变化。

所以音视频数据采集的NTP绝对时间差也不恒定，需要实时更新计算。在接收端计算发送端音视频采集时间差的整体思想是：通过音视频的RTP报文时间戳和发送端发送SR报文中的NTPtime计算。

1、在RTCP的SR报告中，有当前一帧媒体流采集时间和媒体时间戳的对应关系。

Sender SSRC：该SR报告统计的发送的媒体报文的同源标识符

Timestamp：是发送该SR报告当时的系统NTP时间

RTP timestamp：发送该SR报告的系统NTP时间对应的媒体时间戳

SR封装代码如下：

2、视频帧率虽然可能动态变化，但是通话过程中采样率不会变，也就是说单位时间戳差值代表的绝对时间差值不变，所以当连续收到同一个SSRC报文的两个SR报告，就能确定NTP time与时间戳的一次线程方程的系数，后续任意收到一个时间戳，都可以根据这个一次线性方程（）计算出音视频采集的NTP绝对时间。其中slope的物理意义是音视频采样率（视频固定采样率为90K，音频根据不同编解码参数，有8K、16K、48K等），offset是音视频时间戳的起始偏移（为防止网络报文异常攻击篡改，RTP报文时间戳都不是以0开始，系统启动的时候，会获取一个随机数，作为时间戳的起始值）。

功能具体实现在RtpToNtpEstimator类中实现。核心函数是RtpToNtpEstimator::UpdateMeasurements和RtpToNtpEstimator::Estimate。

RtpToNtpEstimator::UpdateMeasurements

通过push进来的SR报告，计算媒体的slope采样率和offset起始偏移。

RtpToNtpEstimator::UpdateMeasurements函数分两步，首先对进行合法性判断，判断合法后，调用RtpToNtpEstimator::UpdateParameters计算媒体的slope采样率和offset起始偏移。

void RtpToNtpEstimator::UpdateParameters() {
  size_t n = measurements_.size();
  if (n < 2)
    return;

  // Run linear regression:
  // Given x[] and y[] writes out such k and b that line y=k*x+b approximates
  // given points in the best way (Least Squares Method).
  auto x = [](const RtcpMeasurement& m) {
    return static_cast<double>(m.unwrapped_rtp_timestamp);
  };
  auto y = [](const RtcpMeasurement& m) {
    return static_cast<double>(static_cast<uint64_t>(m.ntp_time));
  };

  double avg_x = 0;
  double avg_y = 0;
  for (const RtcpMeasurement& m : measurements_) {
    avg_x += x(m);
    avg_y += y(m);
  }
  avg_x /= n;
  avg_y /= n;

  double variance_x = 0;
  double covariance_xy = 0;
  for (const RtcpMeasurement& m : measurements_) {
    double normalized_x = x(m) - avg_x;
    double normalized_y = y(m) - avg_y;
    variance_x += normalized_x * normalized_x;
    covariance_xy += normalized_x * normalized_y;
  }

  if (std::fabs(variance_x) < 1e-8)
    return;

  double k = covariance_xy / variance_x;
  double b = avg_y - k * avg_x;
  params_ = {{.slope = k, .offset = b}};
}

RtpToNtpEstimator::Estimate

根据UpdateParameters计算出来的媒体采样帧率和起始偏移，和新进入的媒体时间戳，计算新进入帧的视频采集绝对时间。

NtpTime RtpToNtpEstimator::Estimate(uint32_t rtp_timestamp) const {
  if (!params_)
    return NtpTime();

  double estimated =
      static_cast<double>(unwrapper_.Unwrap(rtp_timestamp)) * params_->slope +
      params_->offset + 0.5f;

  return NtpTime(rtc::saturated_cast<uint64_t>(estimated));
}

3）计算音视频渲染缓冲时间delaytime

核心函数是StreamSynchronization类里面的ComputeRelativeDelay和ComputeDelays。

StreamSynchronization::ComputeRelativeDelay

用来计算（音视频渲染different time-音视频采集different time）

bool StreamSynchronization::ComputeRelativeDelay(
    const Measurements& audio_measurement,
    const Measurements& video_measurement,
    int* relative_delay_ms) {
  NtpTime audio_last_capture_time =
      audio_measurement.rtp_to_ntp.Estimate(audio_measurement.latest_timestamp);
  if (!audio_last_capture_time.Valid()) {
    return false;
  }
  NtpTime video_last_capture_time =
      video_measurement.rtp_to_ntp.Estimate(video_measurement.latest_timestamp);
  if (!video_last_capture_time.Valid()) {
    return false;
  }
  int64_t audio_last_capture_time_ms = audio_last_capture_time.ToMs();
  int64_t video_last_capture_time_ms = video_last_capture_time.ToMs();

  // Positive diff means that video_measurement is behind audio_measurement.
  *relative_delay_ms =
      video_measurement.latest_receive_time_ms -
      audio_measurement.latest_receive_time_ms -
      (video_last_capture_time_ms - audio_last_capture_time_ms);

  if (*relative_delay_ms > kMaxDeltaDelayMs ||
      *relative_delay_ms < -kMaxDeltaDelayMs) {
    return false;
  }
  return true;
}

StreamSynchronization::ComputeDelays

用来计算音视频各需要的渲染缓冲延时时间

bool StreamSynchronization::ComputeDelays(int relative_delay_ms,
                                          int current_audio_delay_ms,
                                          int* total_audio_delay_target_ms,
                                          int* total_video_delay_target_ms) {
  int current_video_delay_ms = *total_video_delay_target_ms;

  RTC_LOG(LS_VERBOSE) << "Audio delay: " << current_audio_delay_ms
                      << " current diff: " << relative_delay_ms
                      << " for stream " << audio_stream_id_;

  // Calculate the difference between the lowest possible video delay and the
  // current audio delay.
  int current_diff_ms =
      current_video_delay_ms - current_audio_delay_ms + relative_delay_ms;

  avg_diff_ms_ =
      ((kFilterLength - 1) * avg_diff_ms_ + current_diff_ms) / kFilterLength;
  if (abs(avg_diff_ms_) < kMinDeltaMs) {
    // Don't adjust if the diff is within our margin.
    return false;
  }

  // Make sure we don't move too fast.
  int diff_ms = avg_diff_ms_ / 2;
  diff_ms = std::min(diff_ms, kMaxChangeMs);
  diff_ms = std::max(diff_ms, -kMaxChangeMs);

  // Reset the average after a move to prevent overshooting reaction.
  avg_diff_ms_ = 0;

  if (diff_ms > 0) {
    // The minimum video delay is longer than the current audio delay.
    // We need to decrease extra video delay, or add extra audio delay.
    if (video_delay_.extra_ms > base_target_delay_ms_) {
      // We have extra delay added to ViE. Reduce this delay before adding
      // extra delay to VoE.
      video_delay_.extra_ms -= diff_ms;
      audio_delay_.extra_ms = base_target_delay_ms_;
    } else {  // video_delay_.extra_ms > 0
      // We have no extra video delay to remove, increase the audio delay.
      audio_delay_.extra_ms += diff_ms;
      video_delay_.extra_ms = base_target_delay_ms_;
    }
  } else {  // if (diff_ms > 0)
    // The video delay is lower than the current audio delay.
    // We need to decrease extra audio delay, or add extra video delay.
    if (audio_delay_.extra_ms > base_target_delay_ms_) {
      // We have extra delay in VoiceEngine.
      // Start with decreasing the voice delay.
      // Note: diff_ms is negative; add the negative difference.
      audio_delay_.extra_ms += diff_ms;
      video_delay_.extra_ms = base_target_delay_ms_;
    } else {  // audio_delay_.extra_ms > base_target_delay_ms_
      // We have no extra delay in VoiceEngine, increase the video delay.
      // Note: diff_ms is negative; subtract the negative difference.
      video_delay_.extra_ms -= diff_ms;  // X - (-Y) = X + Y.
      audio_delay_.extra_ms = base_target_delay_ms_;
    }
  }

  // Make sure that video is never below our target.
  video_delay_.extra_ms =
      std::max(video_delay_.extra_ms, base_target_delay_ms_);

  int new_video_delay_ms;
  if (video_delay_.extra_ms > base_target_delay_ms_) {
    new_video_delay_ms = video_delay_.extra_ms;
  } else {
    // No change to the extra video delay. We are changing audio and we only
    // allow to change one at the time.
    new_video_delay_ms = video_delay_.last_ms;
  }

  // Make sure that we don't go below the extra video delay.
  new_video_delay_ms = std::max(new_video_delay_ms, video_delay_.extra_ms);

  // Verify we don't go above the maximum allowed video delay.
  new_video_delay_ms =
      std::min(new_video_delay_ms, base_target_delay_ms_ + kMaxDeltaDelayMs);

  int new_audio_delay_ms;
  if (audio_delay_.extra_ms > base_target_delay_ms_) {
    new_audio_delay_ms = audio_delay_.extra_ms;
  } else {
    // No change to the audio delay. We are changing video and we only allow to
    // change one at the time.
    new_audio_delay_ms = audio_delay_.last_ms;
  }

  // Make sure that we don't go below the extra audio delay.
  new_audio_delay_ms = std::max(new_audio_delay_ms, audio_delay_.extra_ms);

  // Verify we don't go above the maximum allowed audio delay.
  new_audio_delay_ms =
      std::min(new_audio_delay_ms, base_target_delay_ms_ + kMaxDeltaDelayMs);

  video_delay_.last_ms = new_video_delay_ms;
  audio_delay_.last_ms = new_audio_delay_ms;

  RTC_LOG(LS_VERBOSE) << "Sync video delay " << new_video_delay_ms
                      << " for video stream " << video_stream_id_
                      << " and audio delay " << audio_delay_.extra_ms
                      << " for audio stream " << audio_stream_id_;

  *total_video_delay_target_ms = new_video_delay_ms;
  *total_audio_delay_target_ms = new_audio_delay_ms;
  return true;
}

RtpStreamsSynchronizer::UpdateDelay

生效计算出来的音视频目标延时。

以视频调整延时参数为例：

syncable_video_->SetMinimumPlayoutDelay对应videoReceiveStream2::SetMinimumPlayoutDelay
->VideoReceiveStream2::UpdatePlayoutDelays
->VCMTiming::set_max_playout_delay