Concepts:
NALUs: NALUs are simply a chunk of data of varying length that has a NALU start code header 0x00 00 00 01 YY
where the first 5 bits of YY
tells you what type of NALU this is and therefore what type of data follows the header. (Since you only need the first 5 bits, I use YY & 0x1F
to just get the relevant bits.) I list what all these types are in the method NSString * const naluTypesStrings[]
, but you don't need to know what they all are.
Parameters: Your decoder needs parameters so it knows how the H.264 video data is stored. The 2 you need to set are Sequence Parameter Set (SPS) and Picture Parameter Set (PPS) and they each have their own NALU type number. You don't need to know what the parameters mean, the decoder knows what to do with them.
H.264 Stream Format: In most H.264 streams, you will receive with an initial set of PPS and SPS parameters followed by an i frame (aka IDR frame or flush frame) NALU. Then you will receive several P frame NALUs (maybe a few dozen or so), then another set of parameters (which may be the same as the initial parameters) and an i frame, more P frames, etc. i frames are much bigger than P frames. Conceptually you can think of the i frame as an entire image of the video, and the P frames are just the changes that have been made to that i frame, until you receive the next i frame.
Procedure:
-
Generate individual NALUs from your H.264 stream. I cannot show code for this step since it depends a lot on what video source you're using. I made this graphic to show what I was working with ("data" in the graphic is "frame" in my following code), but your case may and probably will differ. My method
receivedRawVideoFrame:
is called every time I receive a frame (uint8_t *frame
) which was one of 2 types. In the diagram, those 2 frame types are the 2 big purple boxes. -
Create a CMVideoFormatDescriptionRef from your SPS and PPS NALUs with CMVideoFormatDescriptionCreateFromH264ParameterSets( ). You cannot display any frames without doing this first. The SPS and PPS may look like a jumble of numbers, but VTD knows what to do with them. All you need to know is that
CMVideoFormatDescriptionRef
is a description of video data., like width/height, format type (kCMPixelFormat_32BGRA
,kCMVideoCodecType_H264
etc.), aspect ratio, color space etc. Your decoder will hold onto the parameters until a new set arrives (sometimes parameters are resent regularly even when they haven't changed). -
Re-package your IDR and non-IDR frame NALUs according to the "AVCC" format. This means removing the NALU start codes and replacing them with a 4-byte header that states the length of the NALU. You don't need to do this for the SPS and PPS NALUs. (Note that the 4-byte NALU length header is in big-endian, so if you have a
UInt32
value it must be byte-swapped before copying to theCMBlockBuffer
usingCFSwapInt32
. I do this in my code with thehtonl
function call.) -
Package the IDR and non-IDR NALU frames into CMBlockBuffer. Do not do this with the SPS PPS parameter NALUs. All you need to know about
CMBlockBuffers
is that they are a method to wrap arbitrary blocks of data in core media. (Any compressed video data in a video pipeline is wrapped in this.) -
Package the CMBlockBuffer into CMSampleBuffer. All you need to know about
CMSampleBuffers
is that they wrap up ourCMBlockBuffers
with other information (here it would be theCMVideoFormatDescription
andCMTime
, ifCMTime
is used). -
Create a VTDecompressionSessionRef and feed the sample buffers into VTDecompressionSessionDecodeFrame( ). Alternatively, you can use
AVSampleBufferDisplayLayer
and itsenqueueSampleBuffer:
method and you won't need to use VTDecompSession. It's simpler to set up, but will not throw errors if something goes wrong like VTD will. -
In the VTDecompSession callback, use the resultant CVImageBufferRef to display the video frame. If you need to convert your
CVImageBuffer
to aUIImage
, see my StackOverflow answer here.
Other notes:
-
H.264 streams can vary a lot. From what I learned, NALU start code headers are sometimes 3 bytes (
0x00 00 01
) and sometimes 4 (0x00 00 00 01
). My code works for 4 bytes; you will need to change a few things around if you're working with 3. -
If you want to know more about NALUs, I found this answer to be very helpful. In my case, I found that I didn't need to ignore the "emulation prevention" bytes as described, so I personally skipped that step but you may need to know about that.
-
If your VTDecompressionSession outputs an error number (like -12909) look up the error code in your XCode project. Find the VideoToolbox framework in your project navigator, open it and find the header VTErrors.h. If you can't find it, I've also included all the error codes below in another answer.
Code Example:
So let's start by declaring some global variables and including the VT framework (VT = Video Toolbox).
#import <VideoToolbox/VideoToolbox.h>
@property (nonatomic, assign) CMVideoFormatDescriptionRef formatDesc;
@property (nonatomic, assign) VTDecompressionSessionRef decompressionSession;
@property (nonatomic, retain) AVSampleBufferDisplayLayer *videoLayer;
@property (nonatomic, assign) int spsSize;
@property (nonatomic, assign) int ppsSize;
The following array is only used so that you can print out what type of NALU frame you are receiving. If you know what all these types mean, good for you, you know more about H.264 than me :) My code only handles types 1, 5, 7 and 8.
NSString * const naluTypesStrings[] =
{
@"0: Unspecified (non-VCL)",
@"1: Coded slice of a non-IDR picture (VCL)", // P frame
@"2: Coded slice data partition A (VCL)",
@"3: Coded slice data partition B (VCL)",
@"4: Coded slice data partition C (VCL)",
@"5: Coded slice of an IDR picture (VCL)", // I frame
@"6: Supplemental enhancement information (SEI) (non-VCL)",
@"7: Sequence parameter set (non-VCL)", // SPS parameter
@"8: Picture parameter set (non-VCL)", // PPS parameter
@"9: Access unit delimiter (non-VCL)",
@"10: End of sequence (non-VCL)",
@"11: End of stream (non-VCL)",
@"12: Filler data (non-VCL)",
@"13: Sequence parameter set extension (non-VCL)",
@"14: Prefix NAL unit (non-VCL)",
@"15: Subset sequence parameter set (non-VCL)",
@"16: Reserved (non-VCL)",
@"17: Reserved (non-VCL)",
@"18: Reserved (non-VCL)",
@"19: Coded slice of an auxiliary coded picture without partitioning (non-VCL)",
@"20: Coded slice extension (non-VCL)",
@"21: Coded slice extension for depth view components (non-VCL)",
@"22: Reserved (non-VCL)",
@"23: Reserved (non-VCL)",
@"24: STAP-A Single-time aggregation packet (non-VCL)",
@"25: STAP-B Single-time aggregation packet (non-VCL)",
@"26: MTAP16 Multi-time aggregation packet (non-VCL)",
@"27: MTAP24 Multi-time aggregation packet (non-VCL)",
@"28: FU-A Fragmentation unit (non-VCL)",
@"29: FU-B Fragmentation unit (non-VCL)",
@"30: Unspecified (non-VCL)",
@"31: Unspecified (non-VCL)",
};
Now this is where all the magic happens.
-(