mediasoup simucast consumer分析
1. 什么是simulcast
webrtc 具有Simulcast功能,可以将当前采集到的视频帧编码成多个流发送,每个流具有不同的分辨率和帧率,mediasoup 根据其网络带宽或其他参数(如通过API选择)选择适当的流转发给其他参与者
2. mediasoup simucast 当前现状
在webrtc M70及M70以上版本vp8,vp9,h264三种编码器都支持Simulcast功能,但mediasoup sfu目前只有vp8,h264两种编码器支持Simulcast功能。
3. mediasoup simucast consumer 实现
上面这张图是mediasoup 官网上的设计架构图, 我们可以看到当前room的观看端都会有自己的两个 consumer,分别是audio consumer和video consumer,对于普通的webrtc观看端,video consumer接收来自推流端的单分辨率流。对于simucast consumer 来说也是相同的层次类型,也有audio consumer和video consumer两种,但simucast video consumer 同时接收来自推流端的多种分辨率的RTP流。
simucast consumer 收到多流,切换流,有三个关键的问题需要解决:
- 切换的判断条件是什么
- 如何选择合适的流给观看端
- 不同分辨率之间的rtp流的包序列号和时间戳可能不一样,切换后会导致观看端出现音视频不同步等问题,如何保证观看端的音视频体验。
下面我们针对这3个问题看下mediasoup simucast consumer 是如何解决的?
3.1 名词解释
spatialLayer | 分辨率 |
---|---|
temporalLayer | 帧率 |
score | 当前流的质量,范围为0-10之间 ,0表示当前流已停止推送 |
比如下面的例子:
Spatial Layer (0): 320 240
Spatial Layer (1): 640 480
Spatial Layer (2): 1280 720
Temporal Layer (0): 3
Temporal Layer (1): 11
Temporal Layer (2): 14
3.2 触发切换的时机
3.2.1 观看端手动选择
对应的mediasoup json参数:
{
"id":211,
"method":"consumer.setPreferredLayers",
"internal":{
"routerId":"6483a8a3-fa90-49f7-9f02-c9fa2e93e8f4",
"transportId":"88636bae-3aa7-478d-b187-2d4d420daac7",
"consumerId":"b93057a9-c355-4ee1-98d7-2e7757c0f5ec",
"producerId":"cb9c8eea-fd6f-4b22-bd29-1a29b61c4304"
},
"data":{
"spatialLayer":2,
"temporalLayer":2
}
}
在没有网络带宽评估的情况下,最终的值会设置成观看端请求的spatialLayer和temporalLayer,观看端可以直接切换到选择的流。
但存在网络带宽评估的情况下,观看端请求的spatialLayer和temporalLayer会设置为优先选择的值,最终目标 spatialLayer和temporalLayer的值会经过实际的计算,具体的过程在3.3 小节会具体描述。
对应的执行堆栈如下:
3.2.2 mediasoup 服务端动态切换:
切换思路主要是两点:
- 当前分辨率流是否处于保活状态,流已停止推送,score 为0;流仍在推送,score 为10,通过这两个值让simulcast consumer 知道流的保活状态
- 服务端下行会计算带宽估计值,自动切换某个分辨率的流给订阅客户端
下面讲下具体的触发场景:
- 每个Producer每个分辨率流都会有一个对应的RTC::RtpStreamRecv,这个RTC::RtpStreamRecv会设置定时器定时重置流的score值。这里定时器主要是定期检测当前流是否还在推送,防止room client停止推送某个分辨率流,导致订阅客户端就接收不到视频流的情况。对应的执行堆栈如下:
- consumer 发送端的拥塞控制定时器会定时计算分配可用的输出码率,触发切换分辨率.对应的执行堆栈如下:
- mediasoup 收到观看端 的rtcp Transport-CC 反馈消息,会进行计算分配可用的输出码率,触发切换分辨率。对应的执行堆栈如下:
- 收到RTP包,会设置当前流的score为10,通知simulcast consumer 流的状态。对应的执行堆栈如下:
3.3 确定触发切换,怎样选择合适的流
room 参与者都有一个send webrtc Transport ,负责接收除了自己以外全部参会者的流,一个参与者的流,都会有一个对应的Consumer,所以一个send webrtc Transport ,可能会有多个Consumer;并且send webrtc Transport 都有一个对应的 TransportCongestionControlClient
,负责计算全部Consumer在当前send webrtcTransport中的码流分配。
根据网络带宽选择合适的流主要处理逻辑在 Transport::DistributeAvailableOutgoingBitrate()
函数上。订阅者切换某个观看端的流需要考虑当前send webrtcTransport 全部Consumer的码率分配。
首先获取到当前tccClient 计算得出(availableBitrate)总的有效带宽,按照Consumer的优先级,优先级高的优先分配码率,最后应用每个Consumer计算好的 spatialLayer和temporalLayer。
下面的代码是如何计算SimulcastConsumer 的spatialLayer和temporalLayer。
uint32_t SimulcastConsumer::IncreaseLayer(uint32_t bitrate, bool considerLoss)
{
MS_TRACE();
MS_ASSERT(this->externallyManagedBitrate, "bitrate is not externally managed");
MS_ASSERT(IsActive(), "should be active");
// If already in the preferred layers, do nothing.
// clang-format off
if (
this->provisionalTargetSpatialLayer == this->preferredSpatialLayer &&
this->provisionalTargetTemporalLayer == this->preferredTemporalLayer
)
// clang-format on
{
return 0u;
}
uint32_t virtualBitrate;
if (considerLoss)
{
// Calculate virtual available bitrate based on given bitrate and our
// packet lost.
auto lossPercentage = this->rtpStream->GetLossPercentage();
if (lossPercentage < 2)
virtualBitrate = 1.08 * bitrate;
else if (lossPercentage > 10)
virtualBitrate = (1 - 0.5 * (lossPercentage / 100)) * bitrate;
else
virtualBitrate = bitrate;
}
else
{
virtualBitrate = bitrate;
}
uint32_t requiredBitrate{ 0u };
int16_t spatialLayer{ 0 };
int16_t temporalLayer{ 0 };
auto nowMs = DepLibUV::GetTimeMs();
for (size_t sIdx{ 0u }; sIdx < this->producerRtpStreams.size(); ++sIdx)
{
spatialLayer = static_cast<int16_t>(sIdx);
// If this is higher than current spatial layer and we moved to to current spatial
// layer due to BWE limitations, check how much it has elapsed since then.
if (nowMs - this->lastBweDowngradeAtMs < BweDowngradeConservativeMs)
{
if (this->provisionalTargetSpatialLayer > -1 && spatialLayer > this->currentSpatialLayer)
{
MS_DEBUG_DEV(
"avoid upgrading to spatial layer %" PRIi16 " due to recent BWE downgrade", spatialLayer);
goto done;
}
}
// Ignore spatial layers lower than the one we already have.
if (spatialLayer < this->provisionalTargetSpatialLayer)
continue;
// This can be null.
auto* producerRtpStream = this->producerRtpStreams.at(spatialLayer);
// Producer stream does not exist or it's not good. Ignore.
if (!producerRtpStream || producerRtpStream->GetScore() < StreamGoodScore)
continue;
// If the stream has not been active time enough and we have an active one
// already, move to the next spatial layer.
// clang-format off
if (
spatialLayer != this->provisionalTargetSpatialLayer &&
this->provisionalTargetSpatialLayer != -1 &&
producerRtpStream->GetActiveMs() < StreamMinActiveMs
)
// clang-format on
{
const auto* provisionalProducerRtpStream =
this->producerRtpStreams.at(this->provisionalTargetSpatialLayer);
// The stream for the current provisional spatial layer has been active
// for enough time, move to the next spatial layer.
if (provisionalProducerRtpStream->GetActiveMs() >= StreamMinActiveMs)
continue;
}
// We may not yet switch to this spatial layer.
if (!CanSwitchToSpatialLayer(spatialLayer))
continue;
temporalLayer = 0;
// Check bitrate of every temporal layer.
for (; temporalLayer < producerRtpStream->GetTemporalLayers(); ++temporalLayer)
{
// Ignore temporal layers lower than the one we already have (taking into account
// the spatial layer too).
// clang-format off
if (
spatialLayer == this->provisionalTargetSpatialLayer &&
temporalLayer <= this->provisionalTargetTemporalLayer
)
// clang-format on
{
continue;
}
requiredBitrate = producerRtpStream->GetLayerBitrate(nowMs, 0, temporalLayer);
// This is simulcast so we must substract the bitrate of the current temporal
// spatial layer if this is the temporal layer 0 of a higher spatial layer.
//
// clang-format off
if (
requiredBitrate &&
temporalLayer == 0 &&
this->provisionalTargetSpatialLayer > -1 &&
spatialLayer > this->provisionalTargetSpatialLayer
)
// clang-format on
{
auto* provisionalProducerRtpStream =
this->producerRtpStreams.at(this->provisionalTargetSpatialLayer);
auto provisionalRequiredBitrate = provisionalProducerRtpStream->GetLayerBitrate(
nowMs, 0, this->provisionalTargetTemporalLayer);
if (requiredBitrate > provisionalRequiredBitrate)
requiredBitrate -= provisionalRequiredBitrate;
else
requiredBitrate = 1u; // Don't set 0 since it would be ignored.
}
MS_DEBUG_DEV(
"testing layers %" PRIi16 ":%" PRIi16 " [virtual bitrate:%" PRIu32
", required bitrate:%" PRIu32 "]",
spatialLayer,
temporalLayer,
virtualBitrate,
requiredBitrate);
// If active layer, end iterations here. Otherwise move to next spatial layer.
if (requiredBitrate)
goto done;
else
break;
}
// If this is the preferred or higher spatial layer, take it and exit.
if (spatialLayer >= this->preferredSpatialLayer)
break;
}
done:
// No higher active layers found.
if (!requiredBitrate)
return 0u;
// No luck.
if (requiredBitrate > virtualBitrate)
return 0u;
// Set provisional layers.
this->provisionalTargetSpatialLayer = spatialLayer;
this->provisionalTargetTemporalLayer = temporalLayer;
MS_DEBUG_DEV(
"setting provisional layers to %" PRIi16 ":%" PRIi16 " [virtual bitrate:%" PRIu32
", required bitrate:%" PRIu32 "]",
this->provisionalTargetSpatialLayer,
this->provisionalTargetTemporalLayer,
virtualBitrate,
requiredBitrate);
if (requiredBitrate <= bitrate)
return requiredBitrate;
else if (requiredBitrate <= virtualBitrate)
return bitrate;
else
return requiredBitrate; // NOTE: This cannot happen.
}
3.4 确定合适的流后,如何保证观看端的播放体验
不同分辨率之间的rtp流的包序列号和时间戳不一样,切换后会导致观看端出现音视频不同步等问题,如何保证观看端的音视频体验,最主要的问题是确定视频RTP包的ssrc,Sequence和Timestamp三个参数,ssrc和Sequence比较好解决,只要保证ssrc不变和SequenceNumber的连续性就好了。
但对于Timestamp比较特殊,需要保证观看端音视频同步的问题。mediasoup simucast的解决办法是采用了webrtc 计算音视频同步的原理
3.4.1 webrtc 音视频同步计算
这张图是来自阿里巴巴视频云技术微信公众号的webrtc专栏里面,讲音视频同步的文章,对应的链接webrtc音视频同步
webrtc 音视频同步关键是计算出线程方程的斜率rate和偏移量offset
3.4.2 tsOffset 计算
mediasoup simucast 切换流,时间戳修正整体思路:
- 只在收到目标流的关键帧的时候,才开始切换到目标流
- 只有都收到目标流和当前流的SenderReport 信息,才能够切换
- 通过目标流和当前流的SenderReport 信息,计算得出两者rtp流 timestamp的tsOffset
packet->GetTimestamp() - tsOffset <= this->rtpStream->GetMaxPacketTs()
目标流的关键帧时间戳小于等于已发送RTP包的最高时间戳,需要应用一个额外的偏移量来“修复”第3步计算的tsOffset- 最后目标流
uint32_t timestamp = packet->GetTimestamp() - this->tsOffset
问题的关键就是求解出this->tsOffset
先看下第3步tsOffset的计算:
// Calculate NTP and TS stuff.
auto ntpMs1 = producerTsReferenceRtpStream->GetSenderReportNtpMs();
auto ts1 = producerTsReferenceRtpStream->GetSenderReportTs();
auto ntpMs2 = producerTargetRtpStream->GetSenderReportNtpMs();
auto ts2 = producerTargetRtpStream->GetSenderReportTs();
int64_t diffMs;
if (ntpMs2 >= ntpMs1)
diffMs = ntpMs2 - ntpMs1;
else
diffMs = -1 * (ntpMs1 - ntpMs2);
int64_t diffTs = diffMs * this->rtpStream->GetClockRate() / 1000;
uint32_t newTs2 = ts2 - diffTs;
// Apply offset. This is the difference that later must be removed from the
// sending RTP packet.
tsOffset = newTs2 - ts1;
如果目标流的关键帧时间戳小于等于已发送RTP包的最高时间戳,需要应用一个额外的偏移量tsExtraOffset来“修复”第3步计算的tsOffset
if (
shouldSwitchCurrentSpatialLayer &&
(packet->GetTimestamp() - tsOffset <= this->rtpStream->GetMaxPacketTs())
)
// clang-format on
{
// Max delay in ms we allow for the stream when switching.
// https://en.wikipedia.org/wiki/Audio-to-video_synchronization#Recommendations
static const uint32_t MaxExtraOffsetMs{ 75u };
int64_t maxTsExtraOffset = MaxExtraOffsetMs * this->rtpStream->GetClockRate() / 1000;
uint32_t tsExtraOffset =
this->rtpStream->GetMaxPacketTs() - packet->GetTimestamp() + tsOffset;
// NOTE: Don't ask for a key frame if already done.
if (this->keyFrameForTsOffsetRequested)
{
// Give up and use the theoretical offset.
if (tsExtraOffset > maxTsExtraOffset)
{
MS_WARN_TAG(
simulcast,
"giving up on proper stream switching after got a requested keyframe for which still too high RTP timestamp extra offset is needed (%" PRIu32
")",
tsExtraOffset);
tsExtraOffset = 1u;
}
}
else if (tsExtraOffset > maxTsExtraOffset)
{
MS_WARN_TAG(
simulcast,
"cannot switch stream due to too high RTP timestamp extra offset needed (%" PRIu32
"), requesting keyframe",
tsExtraOffset);
RequestKeyFrameForTargetSpatialLayer();
this->keyFrameForTsOffsetRequested = true;
return;
}
// It's common that, when switching spatial layer, the resulting TS for the
// outgoing packet matches the highest seen in the previous stream. Fix it.
else if (tsExtraOffset == 0u)
{
// Apply an expected offset for a new frame in a 30fps stream.
static const uint8_t MsOffset{ 33u }; // (1 / 30 * 1000).
tsExtraOffset = MsOffset * this->rtpStream->GetClockRate() / 1000;
}
if (tsExtraOffset > 0u)
{
MS_DEBUG_TAG(
simulcast,
"RTP timestamp extra offset generated for stream switching: %" PRIu32,
tsExtraOffset);
// Increase the timestamp offset for the whole life of this Producer stream
// (until switched to a different one).
tsOffset -= tsExtraOffset;
}
}