关于IOS VideoToolBox的一些汇总

最新推荐文章于 2024-05-16 09:46:30 发布

dongtinghong

最新推荐文章于 2024-05-16 09:46:30 发布

阅读量1.5w

点赞数 2

分类专栏： iOS

本文链接：https://blog.csdn.net/dongtinghong/article/details/50788490

版权

1. 至少从iPhone4开始，苹果就是支持硬件解码了，但是硬解码API一直是私有API，不开放给开发者使用，只有越狱才能使用，正常的App如果想提交到AppStore是不允许使用私有API的。

2. 从iOS8开始，可能是苹果想通了，开放了硬解码和硬编码API，就是名为 VideoToolbox.framework的API，需要用iOS 8以后才能使用，iOS 7.x上还不行。

这套硬解码API是几个纯C函数，在任何OC或者 C++代码里都可以使用。

首先要把 VideoToolbox.framework 添加到工程里，并且包含以下头文件。

#include <VideoToolbox/VideoToolbox.h>

解码主要需要以下三个函数

VTDecompressionSessionCreate 创建解码 session

VTDecompressionSessionDecodeFrame 解码一个frame

VTDecompressionSessionInvalidate 销毁解码 session

首先要创建 decode session，方法如下：

        OSStatus status = VTDecompressionSessionCreate(kCFAllocatorDefault,
                                              decoderFormatDescription,
                                              NULL, attrs,
                                              &callBackRecord,
                                              &deocderSession);

其中 decoderFormatDescription 是 CMVideoFormatDescriptionRef 类型的视频格式描述，这个需要用H.264的 sps 和 pps数据来创建，调用以下函数创建 decoderFormatDescription

CMVideoFormatDescriptionCreateFromH264ParameterSets

需要注意的是，这里用的 sps和pps数据是不包含“00 00 00 01”的start code的。

attr是传递给decode session的属性词典

        CFDictionaryRef attrs = NULL;
        const void *keys[] = {
        kCVPixelBufferPixelFormatTypeKey };
//      kCVPixelFormatType_420YpCbCr8Planar is YUV420
//      kCVPixelFormatType_420YpCbCr8BiPlanarFullRange is NV12
        uint32_t v = kCVPixelFormatType_420YpCbCr8BiPlanarFullRange;
        const void *values[] = {
        CFNumberCreate(NULL, kCFNumberSInt32Type, &v) };
        attrs = CFDictionaryCreate(NULL, keys, values, 1, NULL, NULL);

其中重要的属性就一个，kCVPixelBufferPixelFormatTypeKey，指定解码后的图像格式，必须指定成NV12，苹果的硬解码器只支持NV12。

callBackRecord 是用来指定回调函数的，解码器支持异步模式，解码后会调用这里的回调函数。

如果 decoderSession创建成功就可以开始解码了。

            VTDecodeFrameFlags flags = 0;
            //kVTDecodeFrame_EnableTemporalProcessing | kVTDecodeFrame_EnableAsynchronousDecompression;
            VTDecodeInfoFlags flagOut = 0;
            CVPixelBufferRef outputPixelBuffer = NULL;
            OSStatus decodeStatus = VTDecompressionSessionDecodeFrame(deocderSession,
                                                                      sampleBuffer,
                                                                      flags,
                                                                      &outputPixelBuffer,
                                                                      &flagOut);

其中 flags 用0 表示使用同步解码，这样比较简单。
其中 sampleBuffer是输入的H.264视频数据，每次输入一个frame。
先用CMBlockBufferCreateWithMemoryBlock 从H.264数据创建一个CMBlockBufferRef实例。
然后用 CMSampleBufferCreateReady创建CMSampleBufferRef实例。
这里要注意的是，传入的H.264数据需要Mp4风格的，就是开始的四个字节是数据的长度而不是“00 00 00 01”的start code，四个字节的长度是big-endian的。
一般来说从视频里读出的数据都是 “00 00 00 01”开头的，这里需要自己转换下。

解码成功之后，outputPixelBuffer里就是一帧 NV12格式的YUV图像了。
如果想获取YUV的数据可以通过

    CVPixelBufferLockBaseAddress(outputPixelBuffer, 0);
    void *baseAddress = CVPixelBufferGetBaseAddress(outputPixelBuffer);

获得图像数据的指针，需要说明baseAddress并不是指向YUV数据，而是指向一个CVPlanarPixelBufferInfo_YCbCrBiPlanar结构体，结构体里记录了两个plane的offset和pitch。

但是如果想把视频播放出来是不需要去读取YUV数据的，因为CVPixelBufferRef是可以直接转换成OpenGL的Texture或者UIImage的。

调用CVOpenGLESTextureCacheCreateTextureFromImage，可以直接创建OpenGL Texture

从 CVPixelBufferRef 创建 UIImage

    CIImage *ciImage = [CIImage imageWithCVPixelBuffer:pixelBuffer];
    UIImage *uiImage = [UIImage imageWithCIImage:ciImage];

解码完成后销毁 decoder session

VTDecompressionSessionInvalidate(deocderSession)

硬解码的基本流程就是这样了，如果需要成功解码播放视频还需要一些H.264视频格式，YUV图像格式，OpenGL等基础知识。

还是有很多小细节要处理的，无法在这里一一说明了，有人有问题可以在评论里讨论。

从解码到播放，大约1000行代码左右，主要是OpenGL渲染的代码比较多。

苹果官方的示例代码：

WWDC - Apple Developer

苹果的例子下载链接实效了，我也找不到那个例子，我自己写了一个。

stevenyao/iOSHardwareDecoder · GitHub

3. 具体流程可以参考github上的一个例子，https://github.com/adison/-VideoToolboxDemo 注意create session和decode session时codecCtx和packet的数据可能会有一些小差别，可以根据具体情况修改

但是对于不同的嵌入式编码版
有的编码器的264码流可以解码成功，有的编码器死活解不了
不知道是否苹果官方存在这方面的限制

4.google到VLC的一个videotoolbox的patch

[PATCH 1/2] video chroma: add a Nv12 copy function which outputs I420
[PATCH 2/2] Add VideoToolbox based decoder
但是找了个mp4封装的视频，用VLCKit跑了一下，CPU变化不大，反而FPS降了一半，效果比软解码还差一大截，不清楚是哪个环节没搞对。。。

Concepts:

NALUs: NALUs are simply a chunk of data of varying length that has a NALU start code header 0x00 00 00 01 YY where the first 5 bits of YY tells you what type of NALU this is and therefore what type of data follows the header. (Since you only need the first 5 bits, I use YY & 0x1F to just get the relevant bits.) I list what all these types are in the method NSString * const naluTypesStrings[], but you don't need to know what they all are.

Parameters: Your decoder needs parameters so it knows how the H.264 video data is stored. The 2 you need to set are Sequence Parameter Set (SPS) and Picture Parameter Set (PPS) and they each have their own NALU type number. You don't need to know what the parameters mean, the decoder knows what to do with them.

H.264 Stream Format: In most H.264 streams, you will receive with an initial set of PPS and SPS parameters followed by an i frame (aka IDR frame or flush frame) NALU. Then you will receive several P frame NALUs (maybe a few dozen or so), then another set of parameters (which may be the same as the initial parameters) and an i frame, more P frames, etc. i frames are much bigger than P frames. Conceptually you can think of the i frame as an entire image of the video, and the P frames are just the changes that have been made to that i frame, until you receive the next i frame.

Procedure:

Generate individual NALUs from your H.264 stream. I cannot show code for this step since it depends a lot on what video source you're using. I made this graphic to show what I was working with ("data" in the graphic is "frame" in my following code), but your case may and probably will differ. My method receivedRawVideoFrame: is called every time I receive a frame (uint8_t *frame) which was one of 2 types. In the diagram, those 2 frame types are the 2 big purple boxes.
Create a CMVideoFormatDescriptionRef from your SPS and PPS NALUs with CMVideoFormatDescriptionCreateFromH264ParameterSets( ). You cannot display any frames without doing this first. The SPS and PPS may look like a jumble of numbers, but VTD knows what to do with them. All you need to know is that CMVideoFormatDescriptionRef is a description of video data., like width/height, format type (kCMPixelFormat_32BGRA, kCMVideoCodecType_H264 etc.), aspect ratio, color space etc. Your decoder will hold onto the parameters until a new set arrives (sometimes parameters are resent regularly even when they haven't changed).
Re-package your IDR and non-IDR frame NALUs according to the "AVCC" format. This means removing the NALU start codes and replacing them with a 4-byte header that states the length of the NALU. You don't need to do this for the SPS and PPS NALUs. (Note that the 4-byte NALU length header is in big-endian, so if you have a UInt32 value it must be byte-swapped before copying to the CMBlockBuffer using CFSwapInt32. I do this in my code with the htonlfunction call.)
Package the IDR and non-IDR NALU frames into CMBlockBuffer. Do not do this with the SPS PPS parameter NALUs. All you need to know about CMBlockBuffers is that they are a method to wrap arbitrary blocks of data in core media. (Any compressed video data in a video pipeline is wrapped in this.)
Package the CMBlockBuffer into CMSampleBuffer. All you need to know about CMSampleBuffersis that they wrap up our CMBlockBuffers with other information (here it would be the CMVideoFormatDescription and CMTime, if CMTime is used).
Create a VTDecompressionSessionRef and feed the sample buffers into VTDecompressionSessionDecodeFrame( ). Alternatively, you can use AVSampleBufferDisplayLayer and its enqueueSampleBuffer: method and you won't need to use VTDecompSession. It's simpler to set up, but will not throw errors if something goes wrong like VTD will.
In the VTDecompSession callback, use the resultant CVImageBufferRef to display the video frame. If you need to convert your CVImageBuffer to a UIImage, see my StackOverflow answer here.

Other notes:

H.264 streams can vary a lot. From what I learned, NALU start code headers are sometimes 3 bytes (0x00 00 01) and sometimes 4 (0x00 00 00 01). My code works for 4 bytes; you will need to change a few things around if you're working with 3.
If you want to know more about NALUs, I found this answer to be very helpful. In my case, I found that I didn't need to ignore the "emulation prevention" bytes as described, so I personally skipped that step but you may need to know about that.
If your VTDecompressionSession outputs an error number (like -12909) look up the error code in your XCode project. Find the VideoToolbox framework in your project navigator, open it and find the header VTErrors.h. If you can't find it, I've also included all the error codes below in another answer.

Code Example:

So let's start by declaring some global variables and including the VT framework (VT = Video Toolbox).

#import <VideoToolbox/VideoToolbox.h>

@property (nonatomic, assign) CMVideoFormatDescriptionRef formatDesc;
@property (nonatomic, assign) VTDecompressionSessionRef decompressionSession;
@property (nonatomic, retain) AVSampleBufferDisplayLayer *videoLayer;
@property (nonatomic, assign) int spsSize;
@property (nonatomic, assign) int ppsSize;

The following array is only used so that you can print out what type of NALU frame you are receiving. If you know what all these types mean, good for you, you know more about H.264 than me :) My code only handles types 1, 5, 7 and 8.

NSString * const naluTypesStrings[] =
{
            
    @"0: Unspecified (non-VCL)",
    @"1: Coded slice of a non-IDR picture (VCL)",    // P frame
    @"2: Coded slice data partition A (VCL)",
    @"3: Coded slice data partition B (VCL)",
    @"4: Coded slice data partition C (VCL)",
    @"5: Coded slice of an IDR picture (VCL)",      // I frame
    @"6: Supplemental enhancement information (SEI) (non-VCL)",
    @"7: Sequence parameter set (non-VCL)",         // SPS parameter
    @"8: Picture parameter set (non-VCL)",          // PPS parameter
    @"9: Access unit delimiter (non-VCL)",
    @"10: End of sequence (non-VCL)",
    @"11: End of stream (non-VCL)",
    @"12: Filler data (non-VCL)",
    @"13: Sequence parameter set extension (non-VCL)",
    @"14: Prefix NAL unit (non-VCL)",
    @"15: Subset sequence parameter set (non-VCL)",
    @"16: Reserved (non-VCL)",
    @"17: Reserved (non-VCL)",
    @"18: Reserved (non-VCL)",
    @"19: Coded slice of an auxiliary coded picture without partitioning (non-VCL)",
    @"20: Coded slice extension (non-VCL)",
    @"21: Coded slice extension for depth view components (non-VCL)",
    @"22: Reserved (non-VCL)",
    @"23: Reserved (non-VCL)",
    @"24: STAP-A Single-time aggregation packet (non-VCL)",
    @"25: STAP-B Single-time aggregation packet (non-VCL)",
    @"26: MTAP16 Multi-time aggregation packet (non-VCL)",
    @"27: MTAP24 Multi-time aggregation packet (non-VCL)",
    @"28: FU-A Fragmentation unit (non-VCL)",
    @"29: FU-B Fragmentation unit (non-VCL)",
    @"30: Unspecified (non-VCL)",
    @"31: Unspecified (non-VCL)",
};