Live Stream Procedure

Previously, in our Part 1, we talked about devs struggling to get the broadcasting available from an iOS device. Now we have direct access to compressed files and this accessibility gave us the freedom we dreamt about. We came up with a qualitatively better method of preparing frames for subsequent broadcasting. In general, we can split this process into three steps.

video-streaming-get-ready-to-rumble

Step 1. Video capturing. Сam catches the video and the device creates CMSamplebuffer packages with all the media samples and data.

Step 2. Video compressing. Received data is being compressed with the help of VideoToolbox. This process compresses the data within the CMSamplebuffer packages.

Step 3. Converting into NALUs to optimize online streaming.

process-steps

Let’s Talk The Process Step By Step

We can refer to numerous documents and examples to explain how the first step is being done, but we want to focus your attention on the fact that during this first step process we receive CMSamplebuffer streamline containing uncompressed CMPixelBuffer data.

During the second stage we need to create and tune VTCompressionSessionRef. To compress an input frame we use VTCompressionSessionEncodeFrame function while using CMSampleBuffer as a parameter for the process. Upon finishing the operation the encoder uses a call-back function, which we set up during the initialization of the VTCompressionSessionRef. As a result of this complicated process we receive a new CMSampleBuffer package, that now contains compressed data. It is the very same CMSampleBuffer stream, but it contains CMBlockBuffer structures with a compressed video.

The following step requires us to convert the CMSampleBuffers’ stream into NALUs (Network Abstraction Layer Unit) stream. That is how it’s usually done when working with H.264 encoder. H.264 stream can be made in two different formats – Annex B and AVCC. Apple calls the format most commonly used for streaming Elementary Stream, we call it Annex B. iOS media libraries can work with H.264 stream in the AVCC format (MPEG-4 stream). Regardless of the format, there are 19 various types of NALUs and each NALU can store two types of data: VCL (Video Coding Layer) or meta. Each package can be easily parsed and processed as it has an appropriate descriptive header. The core difference between Annex B and AVCC formats lies in NALUs being splitted into the videostream.

Two Formats – One Way

Annex B doesn’t carry its own size, but starts with a start code. This code is usually 0x000001 or 0X0000001 (3 or 4 bytes). This allows splitting the whole stream into multiple NALUs.

AVCC defines the size of each NALU with a header that precedes the NALU itself. The header is about 4 bytes long, but might be lesser.

Each CMSampleBuffer package with compressed data contains the following:

  • Pts (CMTime) – presentation time stamp
  • Format description (CMVideoFormatDescription) that apparently carries the description of the format
  • Block Buffer (CMBlockBuffer) that contains parts of or a whole compressed frame

A CMSampleBuffer stream is a stream of I-, B-, and P-frames. Each stream may contain one or multiple AVCC format NALUs. Annex B, which is used for transmitting, is a sequence of PPS (Picture Parameter Set), SPS (Sequence Parameter Set), I-frames, B-frames, and P-frames NALUs. The amount of P- and B-frames may vary.

PPS and SPS contain parameters needed for encoding and must precede each I-frame.

parameters

For each NALU the length parameter is switched for the start code and then added to the stream. In order to correctly measure the length and amount of NALUs in a stream of CMBlockBuffer data, we use the length header coming forward each NALU.

The correct length is being coded in big-endian format and thus we need to swap its value in order to get correct NALU’s length. When a CMSampleBuffer contains an I-frame, we make PPS and SPS NALUs out of I-frame format description and put them before other NALUs from the CMBlockBuffer within the corresponding CMSampleBuffer.

All about results in a H.264 Annex B format stream that is ready to be broadcast and displayed on other devices! 

 

In our First and Second parts we examined the topic of real-time video streaming, how it’s done on iOS and took to pieces the process itself. And now:

Let’s Compare The Results Of Video Encoding On Various iOS Devices!

Disclaimer

In order to convey an informative experiment in comparing various video encoding methods, we created a test environment, an application that allows us to measure the results properly.

The app uses three encoders:

  1. Hardware – accessible via VideoToolbox library.
  2. Hardware – accessible via AVAssetWriter. For this purpose the realization from kickFlip Library was used – OS broadcasting solution for your iOS applications.
  3. Software – compiled ffmpeg 3.0 library with compiled as dependency x264 library.

video streaming knowledge is power

We wanted to find out the limits of each method and conducted research for multiple resolutions used by AVCaptureSession: 352×288, 640×480, 1280×720, 1920×1080, 3480×2160. The handheld devices chosen for the tests: iPhone 4S – the weakest iOS8 device, iPhone 6 plus, and iPad Air 2 – one of the most powerful devices in the market. During the tests we determined and measured the CPU usage and delays when encoding the video in H.264 format.

There is a certain margin of error and the results might differ from the results acquired during such tests in a different environment here. However, our results show the difference in encoding efficiency on various devices and show it quite well.

 ffmpeg with x264 (sw)AVAssetWriter
(kickflip realization)
VideoToolbox
352×288delay: ~ 0.60 s.
CPU Used: ~ 55%-60%
delay: ~ 0.39 – 0.46 s.
CPU Used: ~ 8% – 12%
delay: ~ 0.07 – 0.087s.
CPU Used: ~ 7% – 9%
640×480delay: ~ 0.75 – 0.85 s.
CPU Used: ~ 130% – 160%
delay: ~ 0.46 – 0.5 s.
CPU Used: ~ 8% – 12%
delay: ~ 0.067 – 0.087s.
CPU Used: ~ 7% – 9%
1280×720delay: ~ 1.55 – 1.63 s.
CPU Used: > 160%
delay: ~ 0.688 – 0.77 s.
CPU Used: ~ 8% – 12%
delay: ~ 0.114 – 0.118s.
CPU Used: ~ 7% – 9%
1920×1080delay: ~ 3.8s.
CPU Used: > 160%
delay: ~ 0.84 – 0.88s.
CPU Used: ~ 8% – 12%
delay: ~ 0.177 – 0.181s.
CPU Used: ~ 7% – 9%

It is clear that the software encoder overloads the CPU and it fails to deliver reasonable results even working with 640×480 resolution. You can also notice that there is more than 100% load of CPU meaning that some frames will be left out as the CPU won’t be able to process them in time. Obviously, the device battery will die faster.

Hardware encoders work wonderfully and showed great results regardless of resolutions. The average CPU usage was kept within 7-12% limits. We found that AVAssetWriter has a longer delay and the difference is quite noticeable.

iPhone 6 Plus and iPad Air 2

Here are the results after testing the devices.

iPhone 6 Plus:

 ffmpeg with x264 (sw)AVAssetWriter
(kickflip realization)
VideoToolbox
352×288delay: ~ 0.49 – 0.57 s.
CPU Used: ~ 26%-37%
delay: ~ 0.21 – 0.276 s.
CPU Used: ~ 9% – 10%
delay: ~ 0.03 – 0.04s.
CPU Used: ~ 7% – 8%
640×480delay: ~ 0.49 – 0.57 s.
CPU Used: ~ 40% – 70%
delay: ~ 0.22 – 0.24 s.
CPU Used: ~ 9% – 10%
delay: ~ 0.035s.
CPU Used: ~ 7% – 8%
1280×720delay: ~ 0.64 – 0.70 s.
CPU Used: ~120% – 170%
delay: ~ 0.23 – 0.3 s.
CPU Used: ~ 9% – 10%
delay: ~ 0.044 – 0.045s.
CPU Used: ~ 8% – 9%
1920×1080delay: ~ 1.08 – 1.26s.
CPU Used: > 160%
delay: ~ 0.26s.
CPU Used: ~ 9% – 10%
delay: ~ 0.0615 – 0.069s.
CPU Used: ~ 9% – 10%

iPad Air 2:

 ffmpeg with x264 (sw)AVAssetWriter
(kickflip realization)
VideoToolbox
352×288delay: ~ 0.53 – 0.62 s.
CPU Used: ~ 40%-50%
delay: ~ 0.29s.
CPU Used: ~ 9% – 10%
delay: ~ 0.026s.
CPU Used: ~ 6% – 8%
640×480delay: ~ 0.53 – 0.58 s.
CPU Used: ~ 45% – 60%
delay: ~ 0.29 s.
CPU Used: ~ 9% – 10%
delay: ~ 0.029s.
CPU Used: ~ 7% – 10%
1280×720delay: ~ 0.57 – 0.61 s.
CPU Used: ~90% – 170%
delay: ~ 0.3 s.
CPU Used: ~ 9% – 10%
delay: ~ 0.03s.
CPU Used: ~ 9% – 11%
1920×1080delay: ~ 1.76s.
CPU Used: 180% – 270%
delay: ~ 0.32s.
CPU Used: ~ 9% – 12%
delay: ~ 0.038s.
CPU Used: ~ 10% – 12%

In order to make the data more comprehensible and easy-to-read, we decided to put them on a bar chart. On the charts below you can see how three encoding methods fare against each other.

cpu-usage-graph
stream-delay-graph

Apparently, powerful CPUs handle software encoding much better than previous iterations, but such a high CPU load is unacceptable even for modern handheld devices with improved batteries. To top it all off, the software encoding efficiency is much lower than that of hardware encoders.

This small test shows the true advantage that hardware encoders have over the software solutions. VideoToolbox functionality is much more diverse and efficient when it comes to compressing and broadcasting videos.

It is important to note that the delay of AVAssetWriter solution may increase depending on the encoding method used by a developer. If the minimal delay is a goal then VideoToolbox is much more preferable.

Long Live The Battery!

In order to show just how much impact encoding has on an end-user, we conduct a test on the batteries of the devices. We measured how long they can supply power to the device broadcasting a stream. The test was conducted using iPhone 5s with 100% battery and for 1080p resolution. Here’s the data.

battery-test-graph

The results show clearly that with software encoding the battery lasted less than 2 hours while hardware solutions extended this period to more than 3 hours.

Conclusion

The tests allowed us to determine whether the hardware encoders do their job better than software ones. Moreover, we successfully measure the effectiveness of popular encoding methods for iOS devices and now know exactly which methods to use when making a video broadcasting feature on a iOS device!

主要内容:本文详细介绍了一种QRBiLSTM(分位数回归双向长短期记忆网络)的时间序列区间预测方法。首先介绍了项目背景以及模型的优势,比如能够有效利用双向的信息,并对未来的趋势上限和下限做出估计。接着从数据生成出发讲述了具体的代码操作过程:数据预处理,搭建模型,进行训练,并最终可视化预测结果与计算分位数回归的边界线。提供的示例代码可以完全运行并且包含了数据生成环节,便于新手快速上手,深入学习。此外还指出了模型未来发展的方向,例如加入额外的输入特性和改善超参数配置等途径提高模型的表现。文中强调了时间序列的标准化和平稳检验,在样本划分阶段需要按时间序列顺序进行划分,并在训练阶段采取合适的手段预防过度拟合发生。 适合人群:对于希望学习和应用双向长短时记忆网络解决时序数据预测的初学者和具有一定基础的研究人员。尤其适用于有金融数据分析需求、需要做多一步或多步预测任务的从业者。 使用场景及目标:应用于金融市场波动预报、天气状况变化预测或是物流管理等多个领域内的决策支持。主要目的在于不仅能够提供精确的数值预计还能描绘出相应的区间概率图以增强结论置信程度。 补充说明:本教程通过一个由正弦信号加白噪构造而成的简单实例来指导大家理解和执行QRBiLSTM流程的所有关键步骤,这既方便于初学者跟踪学习,又有利于专业人士作为现有系统的补充参考工具。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值