mediasoup simucast consumer 分析

最新推荐文章于 2024-07-28 21:19:34 发布

The_Old_man_and_sea

最新推荐文章于 2024-07-28 21:19:34 发布

阅读量1.7k

点赞数 5

分类专栏： mediasoup webrtc

本文链接：https://blog.csdn.net/The_Old_man_and_sea/article/details/114488042

版权

webrtc 同时被 2 个专栏收录

9 篇文章 5 订阅

订阅专栏

mediasoup

5 篇文章 4 订阅

订阅专栏

mediasoup simucast consumer分析

1. 什么是simulcast
2. mediasoup simucast 当前现状
3. mediasoup simucast consumer 实现

1. 什么是simulcast

在这里插入图片描述

webrtc 具有Simulcast功能，可以将当前采集到的视频帧编码成多个流发送，每个流具有不同的分辨率和帧率，mediasoup 根据其网络带宽或其他参数（如通过API选择）选择适当的流转发给其他参与者

2. mediasoup simucast 当前现状

在webrtc M70及M70以上版本vp8，vp9，h264三种编码器都支持Simulcast功能，但mediasoup sfu目前只有vp8，h264两种编码器支持Simulcast功能。

3. mediasoup simucast consumer 实现

mediasoup 设计架构图上面这张图是mediasoup 官网上的设计架构图, 我们可以看到当前room的观看端都会有自己的两个 consumer，分别是audio consumer和video consumer，对于普通的webrtc观看端，video consumer接收来自推流端的单分辨率流。对于simucast consumer 来说也是相同的层次类型，也有audio consumer和video consumer两种，但simucast video consumer 同时接收来自推流端的多种分辨率的RTP流。

simucast consumer 收到多流，切换流，有三个关键的问题需要解决：

切换的判断条件是什么
如何选择合适的流给观看端
不同分辨率之间的rtp流的包序列号和时间戳可能不一样，切换后会导致观看端出现音视频不同步等问题，如何保证观看端的音视频体验。

下面我们针对这3个问题看下mediasoup simucast consumer 是如何解决的？

3.1 名词解释

spatialLayer	分辨率
temporalLayer	帧率
score	当前流的质量，范围为0-10之间，0表示当前流已停止推送

比如下面的例子：
Spatial Layer (0): 320 240
Spatial Layer (1): 640 480
Spatial Layer (2): 1280 720
Temporal Layer (0): 3
Temporal Layer (1): 11
Temporal Layer (2): 14

3.2 触发切换的时机

3.2.1 观看端手动选择

对应的mediasoup json参数：

{
    "id":211,
    "method":"consumer.setPreferredLayers",
    "internal":{
        "routerId":"6483a8a3-fa90-49f7-9f02-c9fa2e93e8f4",
        "transportId":"88636bae-3aa7-478d-b187-2d4d420daac7",
        "consumerId":"b93057a9-c355-4ee1-98d7-2e7757c0f5ec",
        "producerId":"cb9c8eea-fd6f-4b22-bd29-1a29b61c4304"
    },
    "data":{
        "spatialLayer":2,
        "temporalLayer":2
    }
}

在没有网络带宽评估的情况下，最终的值会设置成观看端请求的spatialLayer和temporalLayer，观看端可以直接切换到选择的流。

但存在网络带宽评估的情况下，观看端请求的spatialLayer和temporalLayer会设置为优先选择的值，最终目标 spatialLayer和temporalLayer的值会经过实际的计算，具体的过程在3.3 小节会具体描述。

对应的执行堆栈如下：
在这里插入图片描述

3.2.2 mediasoup 服务端动态切换：

切换思路主要是两点：

当前分辨率流是否处于保活状态，流已停止推送，score 为0；流仍在推送，score 为10，通过这两个值让simulcast consumer 知道流的保活状态
服务端下行会计算带宽估计值，自动切换某个分辨率的流给订阅客户端

下面讲下具体的触发场景：

每个Producer每个分辨率流都会有一个对应的RTC::RtpStreamRecv，这个RTC::RtpStreamRecv会设置定时器定时重置流的score值。这里定时器主要是定期检测当前流是否还在推送，防止room client停止推送某个分辨率流，导致订阅客户端就接收不到视频流的情况。对应的执行堆栈如下：
consumer 发送端的拥塞控制定时器会定时计算分配可用的输出码率，触发切换分辨率.对应的执行堆栈如下:
mediasoup 收到观看端的rtcp Transport-CC 反馈消息，会进行计算分配可用的输出码率，触发切换分辨率。对应的执行堆栈如下：
收到RTP包，会设置当前流的score为10，通知simulcast consumer 流的状态。对应的执行堆栈如下：

3.3 确定触发切换，怎样选择合适的流

room 参与者都有一个send webrtc Transport ，负责接收除了自己以外全部参会者的流，一个参与者的流，都会有一个对应的Consumer，所以一个send webrtc Transport ，可能会有多个Consumer；并且send webrtc Transport 都有一个对应的 TransportCongestionControlClient ，负责计算全部Consumer在当前send webrtcTransport中的码流分配。

根据网络带宽选择合适的流主要处理逻辑在 Transport::DistributeAvailableOutgoingBitrate() 函数上。订阅者切换某个观看端的流需要考虑当前send webrtcTransport 全部Consumer的码率分配。

首先获取到当前tccClient 计算得出（availableBitrate）总的有效带宽，按照Consumer的优先级，优先级高的优先分配码率，最后应用每个Consumer计算好的 spatialLayer和temporalLayer。

下面的代码是如何计算SimulcastConsumer 的spatialLayer和temporalLayer。

uint32_t SimulcastConsumer::IncreaseLayer(uint32_t bitrate, bool considerLoss)
	{
		MS_TRACE();

		MS_ASSERT(this->externallyManagedBitrate, "bitrate is not externally managed");
		MS_ASSERT(IsActive(), "should be active");

		// If already in the preferred layers, do nothing.
		// clang-format off
		if (
			this->provisionalTargetSpatialLayer == this->preferredSpatialLayer &&
			this->provisionalTargetTemporalLayer == this->preferredTemporalLayer
		)
		// clang-format on
		{
			return 0u;
		}

		uint32_t virtualBitrate;

		if (considerLoss)
		{
			// Calculate virtual available bitrate based on given bitrate and our
			// packet lost.
			auto lossPercentage = this->rtpStream->GetLossPercentage();

			if (lossPercentage < 2)
				virtualBitrate = 1.08 * bitrate;
			else if (lossPercentage > 10)
				virtualBitrate = (1 - 0.5 * (lossPercentage / 100)) * bitrate;
			else
				virtualBitrate = bitrate;
		}
		else
		{
			virtualBitrate = bitrate;
		}

		uint32_t requiredBitrate{ 0u };
		int16_t spatialLayer{ 0 };
		int16_t temporalLayer{ 0 };
		auto nowMs = DepLibUV::GetTimeMs();

		for (size_t sIdx{ 0u }; sIdx < this->producerRtpStreams.size(); ++sIdx)
		{
			spatialLayer = static_cast<int16_t>(sIdx);

			// If this is higher than current spatial layer and we moved to to current spatial
			// layer due to BWE limitations, check how much it has elapsed since then.
			if (nowMs - this->lastBweDowngradeAtMs < BweDowngradeConservativeMs)
			{
				if (this->provisionalTargetSpatialLayer > -1 && spatialLayer > this->currentSpatialLayer)
				{
					MS_DEBUG_DEV(
					  "avoid upgrading to spatial layer %" PRIi16 " due to recent BWE downgrade", spatialLayer);

					goto done;
				}
			}

			// Ignore spatial layers lower than the one we already have.
			if (spatialLayer < this->provisionalTargetSpatialLayer)
				continue;

			// This can be null.
			auto* producerRtpStream = this->producerRtpStreams.at(spatialLayer);

			// Producer stream does not exist or it's not good. Ignore.
			if (!producerRtpStream || producerRtpStream->GetScore() < StreamGoodScore)
				continue;

			// If the stream has not been active time enough and we have an active one
			// already, move to the next spatial layer.
			// clang-format off
			if (
				spatialLayer != this->provisionalTargetSpatialLayer &&
				this->provisionalTargetSpatialLayer != -1 &&
				producerRtpStream->GetActiveMs() < StreamMinActiveMs
			)
			// clang-format on
			{
				const auto* provisionalProducerRtpStream =
				  this->producerRtpStreams.at(this->provisionalTargetSpatialLayer);

				// The stream for the current provisional spatial layer has been active
				// for enough time, move to the next spatial layer.
				if (provisionalProducerRtpStream->GetActiveMs() >= StreamMinActiveMs)
					continue;
			}

			// We may not yet switch to this spatial layer.
			if (!CanSwitchToSpatialLayer(spatialLayer))
				continue;

			temporalLayer = 0;

			// Check bitrate of every temporal layer.
			for (; temporalLayer < producerRtpStream->GetTemporalLayers(); ++temporalLayer)
			{
				// Ignore temporal layers lower than the one we already have (taking into account
				// the spatial layer too).
				// clang-format off
				if (
					spatialLayer == this->provisionalTargetSpatialLayer &&
					temporalLayer <= this->provisionalTargetTemporalLayer
				)
				// clang-format on
				{
					continue;
				}

				requiredBitrate = producerRtpStream->GetLayerBitrate(nowMs, 0, temporalLayer);

				// This is simulcast so we must substract the bitrate of the current temporal
				// spatial layer if this is the temporal layer 0 of a higher spatial layer.
				//
				// clang-format off
				if (
					requiredBitrate &&
					temporalLayer == 0 &&
					this->provisionalTargetSpatialLayer > -1 &&
					spatialLayer > this->provisionalTargetSpatialLayer
				)
				// clang-format on
				{
					auto* provisionalProducerRtpStream =
					  this->producerRtpStreams.at(this->provisionalTargetSpatialLayer);
					auto provisionalRequiredBitrate = provisionalProducerRtpStream->GetLayerBitrate(
					  nowMs, 0, this->provisionalTargetTemporalLayer);

					if (requiredBitrate > provisionalRequiredBitrate)
						requiredBitrate -= provisionalRequiredBitrate;
					else
						requiredBitrate = 1u; // Don't set 0 since it would be ignored.
				}

				MS_DEBUG_DEV(
				  "testing layers %" PRIi16 ":%" PRIi16 " [virtual bitrate:%" PRIu32
				  ", required bitrate:%" PRIu32 "]",
				  spatialLayer,
				  temporalLayer,
				  virtualBitrate,
				  requiredBitrate);

				// If active layer, end iterations here. Otherwise move to next spatial layer.
				if (requiredBitrate)
					goto done;
				else
					break;
			}

			// If this is the preferred or higher spatial layer, take it and exit.
			if (spatialLayer >= this->preferredSpatialLayer)
				break;
		}

	done:

		// No higher active layers found.
		if (!requiredBitrate)
			return 0u;

		// No luck.
		if (requiredBitrate > virtualBitrate)
			return 0u;

		// Set provisional layers.
		this->provisionalTargetSpatialLayer  = spatialLayer;
		this->provisionalTargetTemporalLayer = temporalLayer;

		MS_DEBUG_DEV(
		  "setting provisional layers to %" PRIi16 ":%" PRIi16 " [virtual bitrate:%" PRIu32
		  ", required bitrate:%" PRIu32 "]",
		  this->provisionalTargetSpatialLayer,
		  this->provisionalTargetTemporalLayer,
		  virtualBitrate,
		  requiredBitrate);

		if (requiredBitrate <= bitrate)
			return requiredBitrate;
		else if (requiredBitrate <= virtualBitrate)
			return bitrate;
		else
			return requiredBitrate; // NOTE: This cannot happen.
	}

3.4 确定合适的流后，如何保证观看端的播放体验

不同分辨率之间的rtp流的包序列号和时间戳不一样，切换后会导致观看端出现音视频不同步等问题，如何保证观看端的音视频体验，最主要的问题是确定视频RTP包的ssrc，Sequence和Timestamp三个参数，ssrc和Sequence比较好解决，只要保证ssrc不变和SequenceNumber的连续性就好了。

但对于Timestamp比较特殊，需要保证观看端音视频同步的问题。mediasoup simucast的解决办法是采用了webrtc 计算音视频同步的原理

3.4.1 webrtc 音视频同步计算

在这里插入图片描述
这张图是来自阿里巴巴视频云技术微信公众号的webrtc专栏里面，讲音视频同步的文章，对应的链接webrtc音视频同步

webrtc 音视频同步关键是计算出线程方程的斜率rate和偏移量offset

3.4.2 tsOffset 计算

mediasoup simucast 切换流，时间戳修正整体思路：

只在收到目标流的关键帧的时候，才开始切换到目标流
只有都收到目标流和当前流的SenderReport 信息，才能够切换
通过目标流和当前流的SenderReport 信息，计算得出两者rtp流 timestamp的tsOffset
packet->GetTimestamp() - tsOffset <= this->rtpStream->GetMaxPacketTs()目标流的关键帧时间戳小于等于已发送RTP包的最高时间戳，需要应用一个额外的偏移量来“修复”第3步计算的tsOffset
最后目标流 uint32_t timestamp = packet->GetTimestamp() - this->tsOffset

问题的关键就是求解出this->tsOffset

先看下第3步tsOffset的计算：

// Calculate NTP and TS stuff.
				auto ntpMs1 = producerTsReferenceRtpStream->GetSenderReportNtpMs();
				auto ts1    = producerTsReferenceRtpStream->GetSenderReportTs();
				auto ntpMs2 = producerTargetRtpStream->GetSenderReportNtpMs();
				auto ts2    = producerTargetRtpStream->GetSenderReportTs();
				int64_t diffMs;

				if (ntpMs2 >= ntpMs1)
					diffMs = ntpMs2 - ntpMs1;
				else
					diffMs = -1 * (ntpMs1 - ntpMs2);

				int64_t diffTs  = diffMs * this->rtpStream->GetClockRate() / 1000;
				uint32_t newTs2 = ts2 - diffTs;

				// Apply offset. This is the difference that later must be removed from the
				// sending RTP packet.
				tsOffset = newTs2 - ts1;

如果目标流的关键帧时间戳小于等于已发送RTP包的最高时间戳，需要应用一个额外的偏移量tsExtraOffset来“修复”第3步计算的tsOffset

if (
				shouldSwitchCurrentSpatialLayer &&
				(packet->GetTimestamp() - tsOffset <= this->rtpStream->GetMaxPacketTs())
			)
			// clang-format on
			{
				// Max delay in ms we allow for the stream when switching.
				// https://en.wikipedia.org/wiki/Audio-to-video_synchronization#Recommendations
				static const uint32_t MaxExtraOffsetMs{ 75u };

				int64_t maxTsExtraOffset = MaxExtraOffsetMs * this->rtpStream->GetClockRate() / 1000;
				uint32_t tsExtraOffset =
				  this->rtpStream->GetMaxPacketTs() - packet->GetTimestamp() + tsOffset;

				// NOTE: Don't ask for a key frame if already done.
				if (this->keyFrameForTsOffsetRequested)
				{
					// Give up and use the theoretical offset.
					if (tsExtraOffset > maxTsExtraOffset)
					{
						MS_WARN_TAG(
						  simulcast,
						  "giving up on proper stream switching after got a requested keyframe for which still too high RTP timestamp extra offset is needed (%" PRIu32
						  ")",
						  tsExtraOffset);

						tsExtraOffset = 1u;
					}
				}
				else if (tsExtraOffset > maxTsExtraOffset)
				{
					MS_WARN_TAG(
					  simulcast,
					  "cannot switch stream due to too high RTP timestamp extra offset needed (%" PRIu32
					  "), requesting keyframe",
					  tsExtraOffset);

					RequestKeyFrameForTargetSpatialLayer();

					this->keyFrameForTsOffsetRequested = true;

					return;
				}
				// It's common that, when switching spatial layer, the resulting TS for the
				// outgoing packet matches the highest seen in the previous stream. Fix it.
				else if (tsExtraOffset == 0u)
				{
					// Apply an expected offset for a new frame in a 30fps stream.
					static const uint8_t MsOffset{ 33u }; // (1 / 30 * 1000).

					tsExtraOffset = MsOffset * this->rtpStream->GetClockRate() / 1000;
				}

				if (tsExtraOffset > 0u)
				{
					MS_DEBUG_TAG(
					  simulcast,
					  "RTP timestamp extra offset generated for stream switching: %" PRIu32,
					  tsExtraOffset);

					// Increase the timestamp offset for the whole life of this Producer stream
					// (until switched to a different one).
					tsOffset -= tsExtraOffset;
				}
			}