Previously, in our Part 1, we talked about devs struggling to get the broadcasting available from an iOS device. Now we have direct access to compressed files and this accessibility gave us the freedom we dreamt about. We came up with a qualitatively better method of preparing frames for subsequent broadcasting. In general, we can split this process into three steps.
Step 1. Video capturing. Сam catches the video and the device creates CMSamplebuffer packages with all the media samples and data.
Step 2. Video compressing. Received data is being compressed with the help of VideoToolbox. This process compresses the data within the CMSamplebuffer packages.
Step 3. Converting into NALUs to optimize online streaming.
Let’s Talk The Process Step By Step
We can refer to numerous documents and examples to explain how the first step is being done, but we want to focus your attention on the fact that during this first step process we receive CMSamplebuffer streamline containing uncompressed CMPixelBuffer data.
During the second stage we need to create and tune VTCompressionSessionRef. To compress an input frame we use VTCompressionSessionEncodeFrame function while using CMSampleBuffer as a parameter for the process. Upon finishing the operation the encoder uses a call-back function, which we set up during the initialization of the VTCompressionSessionRef. As a result of this complicated process we receive a new CMSampleBuffer package, that now contains compressed data. It is the very same CMSampleBuffer stream, but it contains CMBlockBuffer structures with a compressed video.
The following step requires us to convert the CMSampleBuffers’ stream into NALUs (Network Abstraction Layer Unit) stream. That is how it’s usually done when working with H.264 encoder. H.264 stream can be made in two different formats – Annex B and AVCC. Apple calls the format most commonly used for streaming Elementary Stream, we call it Annex B. iOS media libraries can work with H.264 stream in the AVCC format (MPEG-4 stream). Regardless of the format, there are 19 various types of NALUs and each NALU can store two types of data: VCL (Video Coding Layer) or meta. Each package can be easily parsed and processed as it has an appropriate descriptive header. The core difference between Annex B and AVCC formats lies in NALUs being splitted into the videostream.
Two Formats – One Way
Annex B doesn’t carry its own size, but starts with a start code. This code is usually 0x000001
or 0X0000001
(3 or 4 bytes). This allows splitting the whole stream into multiple NALUs.
AVCC defines the size of each NALU with a header that precedes the NALU itself. The header is about 4 bytes long, but might be lesser.
Each CMSampleBuffer package with compressed data contains the following:
- Pts (CMTime) – presentation time stamp
- Format description (CMVideoFormatDescription) that apparently carries the description of the format
- Block Buffer (CMBlockBuffer) that contains parts of or a whole compressed frame
A CMSampleBuffer stream is a stream of I-, B-, and P-frames. Each stream may contain one or multiple AVCC format NALUs. Annex B, which is used for transmitting, is a sequence of PPS (Picture Parameter Set), SPS (Sequence Parameter Set), I-frames, B-frames, and P-frames NALUs. The amount of P- and B-frames may vary.
PPS and SPS contain parameters needed for encoding and must precede each I-frame.
For each NALU the length parameter is switched for the start code and then added to the stream. In order to correctly measure the length and amount of NALUs in a stream of CMBlockBuffer data, we use the length header coming forward each NALU.
The correct length is being coded in big-endian format and thus we need to swap its value in order to get correct NALU’s length. When a CMSampleBuffer contains an I-frame, we make PPS and SPS NALUs out of I-frame format description and put them before other NALUs from the CMBlockBuffer within the corresponding CMSampleBuffer.
All about results in a H.264 Annex B format stream that is ready to be broadcast and displayed on other devices!
In our First and Second parts we examined the topic of real-time video streaming, how it’s done on iOS and took to pieces the process itself. And now:
Let’s Compare The Results Of Video Encoding On Various iOS Devices!
Disclaimer
In order to convey an informative experiment in comparing various video encoding methods, we created a test environment, an application that allows us to measure the results properly.
The app uses three encoders:
- Hardware – accessible via VideoToolbox library.
- Hardware – accessible via AVAssetWriter. For this purpose the realization from kickFlip Library was used – OS broadcasting solution for your iOS applications.
- Software – compiled ffmpeg 3.0 library with compiled as dependency x264 library.
We wanted to find out the limits of each method and conducted research for multiple resolutions used by AVCaptureSession: 352×288, 640×480, 1280×720, 1920×1080, 3480×2160. The handheld devices chosen for the tests: iPhone 4S – the weakest iOS8 device, iPhone 6 plus, and iPad Air 2 – one of the most powerful devices in the market. During the tests we determined and measured the CPU usage and delays when encoding the video in H.264 format.
There is a certain margin of error and the results might differ from the results acquired during such tests in a different environment here. However, our results show the difference in encoding efficiency on various devices and show it quite well.
ffmpeg with x264 (sw) | AVAssetWriter (kickflip realization) | VideoToolbox | |
---|---|---|---|
352×288 | delay: ~ 0.60 s. CPU Used: ~ 55%-60% | delay: ~ 0.39 – 0.46 s. CPU Used: ~ 8% – 12% | delay: ~ 0.07 – 0.087s. CPU Used: ~ 7% – 9% |
640×480 | delay: ~ 0.75 – 0.85 s. CPU Used: ~ 130% – 160% | delay: ~ 0.46 – 0.5 s. CPU Used: ~ 8% – 12% | delay: ~ 0.067 – 0.087s. CPU Used: ~ 7% – 9% |
1280×720 | delay: ~ 1.55 – 1.63 s. CPU Used: > 160% | delay: ~ 0.688 – 0.77 s. CPU Used: ~ 8% – 12% | delay: ~ 0.114 – 0.118s. CPU Used: ~ 7% – 9% |
1920×1080 | delay: ~ 3.8s. CPU Used: > 160% | delay: ~ 0.84 – 0.88s. CPU Used: ~ 8% – 12% | delay: ~ 0.177 – 0.181s. CPU Used: ~ 7% – 9% |
It is clear that the software encoder overloads the CPU and it fails to deliver reasonable results even working with 640×480 resolution. You can also notice that there is more than 100% load of CPU meaning that some frames will be left out as the CPU won’t be able to process them in time. Obviously, the device battery will die faster.
Hardware encoders work wonderfully and showed great results regardless of resolutions. The average CPU usage was kept within 7-12% limits. We found that AVAssetWriter has a longer delay and the difference is quite noticeable.
iPhone 6 Plus and iPad Air 2
Here are the results after testing the devices.
iPhone 6 Plus:
ffmpeg with x264 (sw) | AVAssetWriter (kickflip realization) | VideoToolbox | |
---|---|---|---|
352×288 | delay: ~ 0.49 – 0.57 s. CPU Used: ~ 26%-37% | delay: ~ 0.21 – 0.276 s. CPU Used: ~ 9% – 10% | delay: ~ 0.03 – 0.04s. CPU Used: ~ 7% – 8% |
640×480 | delay: ~ 0.49 – 0.57 s. CPU Used: ~ 40% – 70% | delay: ~ 0.22 – 0.24 s. CPU Used: ~ 9% – 10% | delay: ~ 0.035s. CPU Used: ~ 7% – 8% |
1280×720 | delay: ~ 0.64 – 0.70 s. CPU Used: ~120% – 170% | delay: ~ 0.23 – 0.3 s. CPU Used: ~ 9% – 10% | delay: ~ 0.044 – 0.045s. CPU Used: ~ 8% – 9% |
1920×1080 | delay: ~ 1.08 – 1.26s. CPU Used: > 160% | delay: ~ 0.26s. CPU Used: ~ 9% – 10% | delay: ~ 0.0615 – 0.069s. CPU Used: ~ 9% – 10% |
iPad Air 2:
ffmpeg with x264 (sw) | AVAssetWriter (kickflip realization) | VideoToolbox | |
---|---|---|---|
352×288 | delay: ~ 0.53 – 0.62 s. CPU Used: ~ 40%-50% | delay: ~ 0.29s. CPU Used: ~ 9% – 10% | delay: ~ 0.026s. CPU Used: ~ 6% – 8% |
640×480 | delay: ~ 0.53 – 0.58 s. CPU Used: ~ 45% – 60% | delay: ~ 0.29 s. CPU Used: ~ 9% – 10% | delay: ~ 0.029s. CPU Used: ~ 7% – 10% |
1280×720 | delay: ~ 0.57 – 0.61 s. CPU Used: ~90% – 170% | delay: ~ 0.3 s. CPU Used: ~ 9% – 10% | delay: ~ 0.03s. CPU Used: ~ 9% – 11% |
1920×1080 | delay: ~ 1.76s. CPU Used: 180% – 270% | delay: ~ 0.32s. CPU Used: ~ 9% – 12% | delay: ~ 0.038s. CPU Used: ~ 10% – 12% |
In order to make the data more comprehensible and easy-to-read, we decided to put them on a bar chart. On the charts below you can see how three encoding methods fare against each other.
Apparently, powerful CPUs handle software encoding much better than previous iterations, but such a high CPU load is unacceptable even for modern handheld devices with improved batteries. To top it all off, the software encoding efficiency is much lower than that of hardware encoders.
This small test shows the true advantage that hardware encoders have over the software solutions. VideoToolbox functionality is much more diverse and efficient when it comes to compressing and broadcasting videos.
It is important to note that the delay of AVAssetWriter solution may increase depending on the encoding method used by a developer. If the minimal delay is a goal then VideoToolbox is much more preferable.
Long Live The Battery!
In order to show just how much impact encoding has on an end-user, we conduct a test on the batteries of the devices. We measured how long they can supply power to the device broadcasting a stream. The test was conducted using iPhone 5s with 100% battery and for 1080p resolution. Here’s the data.
The results show clearly that with software encoding the battery lasted less than 2 hours while hardware solutions extended this period to more than 3 hours.
Conclusion
The tests allowed us to determine whether the hardware encoders do their job better than software ones. Moreover, we successfully measure the effectiveness of popular encoding methods for iOS devices and now know exactly which methods to use when making a video broadcasting feature on a iOS device!