一、背景介绍
由于音视频处理的系统路径不同,并且音视频媒体流是分开以RTP over UDP报文形式传输,UDP报文对网络丢包延时敏感,若不进行特殊平滑处理,会导致实际播放时音视频的渲染相对延时与采集延时有偏差,这样就容易产生音视频不同步现象。
音视频同步的基本思想是,在接收端渲染前,对音视频分别进行不同长度的适当缓冲,尽量保证音视频渲染different time = 音视频采集different time。保证音视频同步。
从上图可以看出,要处理音视频同步,接收端要,需要做好三件事:获取接收端音视频数据绝对时间NTP差、获取发送端音视频采集绝对时间NTP差、计算音视频渲染缓冲时间delaytime。
二、实现原理
1)获取接收端音视频数据绝对时间差
接收端音视频数据绝对时间NTP记录比较简单,在每次收报的时候,实时更新该时间戳对应的系统NTP时间即可。
2)获取发送端音视频采集绝对时间差
因为很多场景下,音视频不是同时采集,有时先有音频后有视频,有时先有视频后有音频,所以采集绝对时间差,不一定是0。
另外音频打包时长比较固定,10ms、20ms、40ms等,但是视频打包时长不固定,之前有一篇博客也写了,摄像头采集帧率受光线影响明显,同一次通话中,视频帧率也可能一直变化。
所以音视频数据采集的NTP绝对时间差也不恒定,需要实时更新计算。在接收端计算发送端音视频采集时间差的整体思想是:通过音视频的RTP报文时间戳和发送端发送SR报文中的NTPtime计算。
1、在RTCP的SR报告中,有当前一帧媒体流采集时间和媒体时间戳的对应关系。
Sender SSRC:该SR报告统计的发送的媒体报文的同源标识符
Timestamp:是发送该SR报告当时的系统NTP时间
RTP timestamp:发送该SR报告的系统NTP时间对应的媒体时间戳
SR封装代码如下:
2、视频帧率虽然可能动态变化,但是通话过程中采样率不会变,也就是说单位时间戳差值代表的绝对时间差值不变,所以当连续收到同一个SSRC报文的两个SR报告,就能确定NTP time与时间戳的一次线程方程的系数,后续任意收到一个时间戳,都可以根据这个一次线性方程()计算出音视频采集的NTP绝对时间。其中slope的物理意义是音视频采样率(视频固定采样率为90K,音频根据不同编解码参数,有8K、16K、48K等),offset是音视频时间戳的起始偏移(为防止网络报文异常攻击篡改,RTP报文时间戳都不是以0开始,系统启动的时候,会获取一个随机数,作为时间戳的起始值)。
功能具体实现在RtpToNtpEstimator类中实现。核心函数是RtpToNtpEstimator::UpdateMeasurements和RtpToNtpEstimator::Estimate。
RtpToNtpEstimator::UpdateMeasurements
通过push进来的SR报告,计算媒体的slope采样率和offset起始偏移。
RtpToNtpEstimator::UpdateMeasurements函数分两步,首先对进行合法性判断,判断合法后,调用RtpToNtpEstimator::UpdateParameters计算媒体的slope采样率和offset起始偏移。
void RtpToNtpEstimator::UpdateParameters() {
size_t n = measurements_.size();
if (n < 2)
return;
// Run linear regression:
// Given x[] and y[] writes out such k and b that line y=k*x+b approximates
// given points in the best way (Least Squares Method).
auto x = [](const RtcpMeasurement& m) {
return static_cast<double>(m.unwrapped_rtp_timestamp);
};
auto y = [](const RtcpMeasurement& m) {
return static_cast<double>(static_cast<uint64_t>(m.ntp_time));
};
double avg_x = 0;
double avg_y = 0;
for (const RtcpMeasurement& m : measurements_) {
avg_x += x(m);
avg_y += y(m);
}
avg_x /= n;
avg_y /= n;
double variance_x = 0;
double covariance_xy = 0;
for (const RtcpMeasurement& m : measurements_) {
double normalized_x = x(m) - avg_x;
double normalized_y = y(m) - avg_y;
variance_x += normalized_x * normalized_x;
covariance_xy += normalized_x * normalized_y;
}
if (std::fabs(variance_x) < 1e-8)
return;
double k = covariance_xy / variance_x;
double b = avg_y - k * avg_x;
params_ = {{.slope = k, .offset = b}};
}
RtpToNtpEstimator::Estimate
根据UpdateParameters计算出来的媒体采样帧率和起始偏移,和新进入的媒体时间戳,计算新进入帧的视频采集绝对时间。
NtpTime RtpToNtpEstimator::Estimate(uint32_t rtp_timestamp) const {
if (!params_)
return NtpTime();
double estimated =
static_cast<double>(unwrapper_.Unwrap(rtp_timestamp)) * params_->slope +
params_->offset + 0.5f;
return NtpTime(rtc::saturated_cast<uint64_t>(estimated));
}
3)计算音视频渲染缓冲时间delaytime
核心函数是StreamSynchronization类里面的ComputeRelativeDelay和ComputeDelays。
StreamSynchronization::ComputeRelativeDelay
用来计算(音视频渲染different time-音视频采集different time)
bool StreamSynchronization::ComputeRelativeDelay(
const Measurements& audio_measurement,
const Measurements& video_measurement,
int* relative_delay_ms) {
NtpTime audio_last_capture_time =
audio_measurement.rtp_to_ntp.Estimate(audio_measurement.latest_timestamp);
if (!audio_last_capture_time.Valid()) {
return false;
}
NtpTime video_last_capture_time =
video_measurement.rtp_to_ntp.Estimate(video_measurement.latest_timestamp);
if (!video_last_capture_time.Valid()) {
return false;
}
int64_t audio_last_capture_time_ms = audio_last_capture_time.ToMs();
int64_t video_last_capture_time_ms = video_last_capture_time.ToMs();
// Positive diff means that video_measurement is behind audio_measurement.
*relative_delay_ms =
video_measurement.latest_receive_time_ms -
audio_measurement.latest_receive_time_ms -
(video_last_capture_time_ms - audio_last_capture_time_ms);
if (*relative_delay_ms > kMaxDeltaDelayMs ||
*relative_delay_ms < -kMaxDeltaDelayMs) {
return false;
}
return true;
}
StreamSynchronization::ComputeDelays
用来计算音视频各需要的渲染缓冲延时时间
bool StreamSynchronization::ComputeDelays(int relative_delay_ms,
int current_audio_delay_ms,
int* total_audio_delay_target_ms,
int* total_video_delay_target_ms) {
int current_video_delay_ms = *total_video_delay_target_ms;
RTC_LOG(LS_VERBOSE) << "Audio delay: " << current_audio_delay_ms
<< " current diff: " << relative_delay_ms
<< " for stream " << audio_stream_id_;
// Calculate the difference between the lowest possible video delay and the
// current audio delay.
int current_diff_ms =
current_video_delay_ms - current_audio_delay_ms + relative_delay_ms;
avg_diff_ms_ =
((kFilterLength - 1) * avg_diff_ms_ + current_diff_ms) / kFilterLength;
if (abs(avg_diff_ms_) < kMinDeltaMs) {
// Don't adjust if the diff is within our margin.
return false;
}
// Make sure we don't move too fast.
int diff_ms = avg_diff_ms_ / 2;
diff_ms = std::min(diff_ms, kMaxChangeMs);
diff_ms = std::max(diff_ms, -kMaxChangeMs);
// Reset the average after a move to prevent overshooting reaction.
avg_diff_ms_ = 0;
if (diff_ms > 0) {
// The minimum video delay is longer than the current audio delay.
// We need to decrease extra video delay, or add extra audio delay.
if (video_delay_.extra_ms > base_target_delay_ms_) {
// We have extra delay added to ViE. Reduce this delay before adding
// extra delay to VoE.
video_delay_.extra_ms -= diff_ms;
audio_delay_.extra_ms = base_target_delay_ms_;
} else { // video_delay_.extra_ms > 0
// We have no extra video delay to remove, increase the audio delay.
audio_delay_.extra_ms += diff_ms;
video_delay_.extra_ms = base_target_delay_ms_;
}
} else { // if (diff_ms > 0)
// The video delay is lower than the current audio delay.
// We need to decrease extra audio delay, or add extra video delay.
if (audio_delay_.extra_ms > base_target_delay_ms_) {
// We have extra delay in VoiceEngine.
// Start with decreasing the voice delay.
// Note: diff_ms is negative; add the negative difference.
audio_delay_.extra_ms += diff_ms;
video_delay_.extra_ms = base_target_delay_ms_;
} else { // audio_delay_.extra_ms > base_target_delay_ms_
// We have no extra delay in VoiceEngine, increase the video delay.
// Note: diff_ms is negative; subtract the negative difference.
video_delay_.extra_ms -= diff_ms; // X - (-Y) = X + Y.
audio_delay_.extra_ms = base_target_delay_ms_;
}
}
// Make sure that video is never below our target.
video_delay_.extra_ms =
std::max(video_delay_.extra_ms, base_target_delay_ms_);
int new_video_delay_ms;
if (video_delay_.extra_ms > base_target_delay_ms_) {
new_video_delay_ms = video_delay_.extra_ms;
} else {
// No change to the extra video delay. We are changing audio and we only
// allow to change one at the time.
new_video_delay_ms = video_delay_.last_ms;
}
// Make sure that we don't go below the extra video delay.
new_video_delay_ms = std::max(new_video_delay_ms, video_delay_.extra_ms);
// Verify we don't go above the maximum allowed video delay.
new_video_delay_ms =
std::min(new_video_delay_ms, base_target_delay_ms_ + kMaxDeltaDelayMs);
int new_audio_delay_ms;
if (audio_delay_.extra_ms > base_target_delay_ms_) {
new_audio_delay_ms = audio_delay_.extra_ms;
} else {
// No change to the audio delay. We are changing video and we only allow to
// change one at the time.
new_audio_delay_ms = audio_delay_.last_ms;
}
// Make sure that we don't go below the extra audio delay.
new_audio_delay_ms = std::max(new_audio_delay_ms, audio_delay_.extra_ms);
// Verify we don't go above the maximum allowed audio delay.
new_audio_delay_ms =
std::min(new_audio_delay_ms, base_target_delay_ms_ + kMaxDeltaDelayMs);
video_delay_.last_ms = new_video_delay_ms;
audio_delay_.last_ms = new_audio_delay_ms;
RTC_LOG(LS_VERBOSE) << "Sync video delay " << new_video_delay_ms
<< " for video stream " << video_stream_id_
<< " and audio delay " << audio_delay_.extra_ms
<< " for audio stream " << audio_stream_id_;
*total_video_delay_target_ms = new_video_delay_ms;
*total_audio_delay_target_ms = new_audio_delay_ms;
return true;
}
RtpStreamsSynchronizer::UpdateDelay
生效计算出来的音视频目标延时。
以视频调整延时参数为例:
syncable_video_->SetMinimumPlayoutDelay对应videoReceiveStream2::SetMinimumPlayoutDelay
->VideoReceiveStream2::UpdatePlayoutDelays
->VCMTiming::set_max_playout_delay
由于还要考虑网络抖动和延时,不能严格按照计算出来的同步时间配置,所以在计算渲染时间时,会控制视频渲染延时在max_playout_delay_以内。实现函数为VCMTiming::RenderTimeInternal
三、参考
WebRTC 音视频同步原理与实现 - OSCHINA - 中文开源技术交流社区