【1】
MP4(MPEG-4 Part 14)是一种常见的多媒体容器格式,它是在“ISO/IEC 14496-14”标准文件中定义的。
1.最小组成单元 BOX
2.mp4文件整体结构
Code | Abstract | Defined in/by |
ainf | Asset information to identify, license and play | DECE |
albm | Album title and track number (user-data) | 3GPP |
auth | Media author name (user-data) | 3GPP |
avcn | AVC NAL Unit Storage Box | DECE |
bloc | Base location and purchase location for license acquisition | DECE |
bpcc | Bits per component | JP2 |
buff | Buffering information | AVC |
bxml | binary XML container | ISO |
ccid | OMA DRM Content ID | OMA DRM 2.1 |
cdef | type and ordering of the components within the codestream | JP2 |
clsf | Media classification (user-data) | 3GPP |
cmap | mapping between a palette and codestream components | JP2 |
co64 | 64-bit chunk offset | ISO |
colr | specifies the colourspace of the image | JP2 |
cprt | copyright etc. (user-data) | ISO |
crhd | reserved for ClockReferenceStream header | MP4V1 |
cslg | composition to decode timeline mapping | ISO |
ctts | (composition) time to sample | ISO |
cvru | OMA DRM Cover URI | OMA DRM 2.1 |
dcfD | Marlin DCF Duration, user-data atom type | OMArlin |
dinf | data information box, container | ISO |
dref | data reference box, declares source(s) of media data in track | ISO |
dscp | Media description (user-data) | 3GPP |
dsgd | DVB Sample Group Description Box | DVB |
dstg | DVB Sample to Group Box | DVB |
edts | edit list container | ISO |
elst | an edit list | ISO |
feci | FEC Informatiom | ISO |
fecr | FEC Reservoir | ISO |
fiin | FD Item Information | ISO |
fire | File Reservoir | ISO |
fpar | File Partition | ISO |
free | free space | ISO |
frma | original format box | ISO |
ftyp | file type and compatibility | JP2, ISO |
gitn | Group ID to name | ISO |
gnre | Media genre (user-data) | 3GPP |
grpi | OMA DRM Group ID | OMA DRM 2.0 |
hdlr | handler, declares the media (handler) type | ISO |
hmhd | hint media header, overall information (hint track only) | ISO |
hpix | Hipix Rich Picture (user-data or meta-data) | HIPIX |
icnu | OMA DRM Icon URI | OMA DRM 2.0 |
ID32 | ID3 version 2 container | inline |
idat | Item data | ISO |
ihdr | Image Header | JP2 |
iinf | item information | ISO |
iloc | item location | ISO |
imif | IPMP Information box | ISO |
infu | OMA DRM Info URL | OMA DRM 2.0 |
iods | Object Descriptor container box | MP4V1 |
iphd | reserved for IPMP Stream header | MP4V1 |
ipmc | IPMP Control Box | ISO |
ipro | item protection | ISO |
iref | Item reference | ISO |
jP$20$20 | JPEG 2000 Signature | JP2 |
jp2c | JPEG 2000 contiguous codestream | JP2 |
jp2h | Header | JP2 |
jp2i | intellectual property information | JP2 |
kywd | Media keywords (user-data) | 3GPP |
loci | Media location information (user-data) | 3GPP |
lrcu | OMA DRM Lyrics URI | OMA DRM 2.1 |
m7hd | reserved for MPEG7Stream header | MP4V1 |
mdat | media data container | ISO |
mdhd | media header, overall information about the media | ISO |
mdia | container for the media information in a track | ISO |
mdri | Mutable DRM information | OMA DRM 2.0 |
meco | additional metadata container | ISO |
mehd | movie extends header box | ISO |
mere | metabox relation | ISO |
meta | Metadata container | ISO |
mfhd | movie fragment header | ISO |
mfra | Movie fragment random access | ISO |
mfro | Movie fragment random access offset | ISO |
minf | media information container | ISO |
mjhd | reserved for MPEG-J Stream header | MP4V1 |
moof | movie fragment | ISO |
moov | container for all the meta-data | ISO |
mvcg | Multiview group | AVC |
mvci | Multiview Information | AVC |
mvex | movie extends box | ISO |
mvhd | movie header, overall declarations | ISO |
mvra | Multiview Relation Attribute | AVC |
nmhd | Null media header, overall information (some tracks only) | ISO |
ochd | reserved for ObjectContentInfoStream header | MP4V1 |
odaf | OMA DRM Access Unit Format | OMA DRM 2.0 |
odda | OMA DRM Content Object | OMA DRM 2.0 |
odhd | reserved for ObjectDescriptorStream header | MP4V1 |
odhe | OMA DRM Discrete Media Headers | OMA DRM 2.0 |
odrb | OMA DRM Rights Object | OMA DRM 2.0 |
odrm | OMA DRM Container | OMA DRM 2.0 |
odtt | OMA DRM Transaction Tracking | OMA DRM 2.0 |
ohdr | OMA DRM Common headers | OMA DRM 2.0 |
padb | sample padding bits | ISO |
paen | Partition Entry | ISO |
pclr | palette which maps a single component in index space to a multiple- component image | JP2 |
pdin | Progressive download information | ISO |
perf | Media performer name (user-data) | 3GPP |
pitm | primary item reference | ISO |
res$20 | grid resolution | JP2 |
resc | grid resolution at which the image was captured | JP2 |
resd | default grid resolution at which the image should be displayed | JP2 |
rtng | Media rating (user-data) | 3GPP |
sbgp | Sample to Group box | AVC, ISO |
schi | scheme information box | ISO |
schm | scheme type box | ISO |
sdep | Sample dependency | AVC |
sdhd | reserved for SceneDescriptionStream header | MP4V1 |
sdtp | Independent and Disposable Samples Box | AVC, ISO |
sdvp | SD Profile Box | SDV |
segr | file delivery session group | ISO |
senc | Sample specific encryption data | DECE |
sgpd | Sample group definition box | AVC, ISO |
sidx | Segment Index Box | 3GPP |
sinf | protection scheme information box | ISO |
skip | free space | ISO |
smhd | sound media header, overall information (sound track only) | ISO |
srmb | System Renewability Message | DVB |
srmc | System Renewability Message container | DVB |
srpp | STRP Process | ISO |
stbl | sample table box, container for the time/space map | ISO |
stco | chunk offset, partial data-offset information | ISO |
stdp | sample degradation priority | ISO |
sthd | Subtitle Media Header Box | DECE |
stsc | sample-to-chunk, partial data-offset information | ISO |
stsd | sample descriptions (codec types, initialization etc.) | ISO |
stsh | shadow sync sample table | ISO |
stss | sync sample table (random access points) | ISO |
stsz | sample sizes (framing) | ISO |
stts | (decoding) time-to-sample | ISO |
styp | Segment Type Box | 3GPP |
stz2 | compact sample sizes (framing) | ISO |
subs | Sub-sample information | ISO |
swtc | Multiview Group Relation | AVC |
tfad | Track fragment adjustment box | 3GPP |
tfhd | Track fragment header | ISO |
tfma | Track fragment media adjustment box | 3GPP |
tfra | Track fragment radom access | ISO |
tibr | Tier Bit rate | AVC |
tiri | Tier Information | AVC |
titl | Media title (user-data) | 3GPP |
tkhd | Track header, overall information about the track | ISO |
traf | Track fragment | ISO |
trak | container for an individual track or stream | ISO |
tref | track reference container | ISO |
trex | track extends defaults | ISO |
trgr | Track grouping information | ISO |
trik | Facilitates random access and trick play modes | DECE |
trun | track fragment run | ISO |
tsel | Track selection (user-data) | 3GPP |
udta | user-data | ISO |
uinf | a tool by which a vendor may provide access to additional information associated with a UUID | JP2 |
UITS | Unique Identifier Technology Solution | Universal Music |
ulst | a list of UUID’s | JP2 |
url$20 | a URL | JP2 |
uuid | user-extension box | ISO, JP2 |
vmhd | video media header, overall information (video track only) | ISO |
vwdi | Multiview Scene Information | AVC |
xml$20 | a tool by which vendors can add XML formatted information | JP2 |
xml$20 | XML container | ISO |
yrrc | Year when media was recorded (user-data) | 3GPP |
Code | Abstract | Defined in/by |
clip | Visual clipping region container | QT |
crgn | Visual clipping region definition | QT |
ctab | Track color-table | QT |
elng | Extended Language Tag | QT |
imap | Track input map definition | QT |
kmat | Compressed visual track matte | QT |
load | Track pre-load definitions | QT |
matt | Visual track matte for compositing | QT |
pnot | Preview container | QT |
wide | Expansion space reservation | QT |
【2】
1.File Type Box
Box Type: `ftyp’
这种box一般情况下都会出现在mp4文件的开头,它可以作为mp4容器格式的可表示信息。就像flv头‘F’ 'L' 'V' 3字节,MKV头部的1A 45 DF A3 、ASF_Header_Object 可以作为ASF容器格式的可辨识信息一样。
ftyp box内容结构如下
- aligned(8) class FileTypeBox
- extends Box(‘ftyp’) {
- unsigned int(32) major_brand;
- unsigned int(32) minor_version;
- unsigned int(32) compatible_brands[]; // to end of the box
- }
2.Movie Box
moov 这个box 里面包含了很多个子box,就像上篇那个图上标的。一般情况下moov 会紧跟着 ftyp。moov里面包含着mp4文件中的metedata。音视频相关的基础信息。让我们看看moov 里面都含有哪些重要的box。
2.1 Movie Header Box
- aligned(8) class MovieHeaderBox extends FullBox(‘mvhd’, version, 0) {
- if (version==1) {
- unsigned int(64) creation_time;
- unsigned int(64) modification_time;
- unsigned int(32) timescale;
- unsigned int(64) duration;
- } else { // version==0
- unsigned int(32) creation_time;
- unsigned int(32) modification_time;
- unsigned int(32) timescale;
- unsigned int(32) duration;
- }
- template int(32) rate = 0x00010000; // typically 1.0
- template int(16) volume = 0x0100; // typically, full volume
- const bit(16) reserved = 0;
- const unsigned int(32)[2] reserved = 0;
- template int(32)[9] matrix =
- { 0x00010000,0,0,0,0x00010000,0,0,0,0x40000000 };
- // Unity matrix
- bit(32)[6] pre_defined = 0;
- unsigned int(32) next_track_ID;
- }
Type | Comment | |
box size | 4 | box大小 |
box type | 4 | box类型 |
version | 1 | box版本,0或1,一般为0。 |
flags | 3 | flags |
creation time | 4 | 创建时间(相对于UTC时间1904-01-01零点的秒数) |
modification time | 4 | 修改时间 |
time scale | 4 | 文件媒体在1秒时间内的刻度值,可以理解为1秒长度的时间单元数 一般情况下视频的 都是90000 |
duration | 4 | 该track的时间长度,用duration和time scale值可以计算track时长,比如audio track的time scale = 8000, duration = 560128,时长为 70.016,video track的time scale = 600, duration = 42000,时长为70 |
rate | 4 | 推荐播放速率,高16位和低16位分别为小数点整数部分和小数部分,即[16.16] 格式,该值为1.0(0x00010000)表示正常前向播放 |
volume | 2 | 与rate类似,[8.8] 格式,1.0(0x0100)表示最大音量 |
reserved | 10 | 保留位 |
matrix | 36 | 视频变换矩阵 |
pre-defined | 24 | |
next track id | 4 | 下一个track使用的id号 |
所以通过解析这部分内容可以或者duration、rate等主要信息。举个例子:
2.2.1 Track Header Box
- aligned(8) class TrackHeaderBox
- extends FullBox(‘tkhd’, version, flags){
- if (version==1) {
- unsigned int(64) creation_time;
- unsigned int(64) modification_time;
- unsigned int(32) track_ID;
- const unsigned int(32) reserved = 0;
- unsigned int(64) duration;
- } else { // version==0
- unsigned int(32) creation_time;
- unsigned int(32) modification_time;
- unsigned int(32) track_ID;
- const unsigned int(32) reserved = 0;
- unsigned int(32) duration;
- }
- const unsigned int(32)[2] reserved = 0;
- template int(16) layer = 0;
- template int(16) alternate_group = 0;
- template int(16) volume = {if track_is_audio 0x0100 else 0};
- const unsigned int(16) reserved = 0;
- template int(32)[9] matrix=
- { 0x00010000,0,0,0,0x00010000,0,0,0,0x40000000 };
- // unity matrix
- unsigned int(32) width;
- unsigned int(32) height;
- }
Field | Type | Comment |
box size | 4 | box大小 |
box type | 4 | box类型 |
version | 1 | box版本,0或1,一般为0。 |
flags | 3 | 按位或操作结果值,预定义如下: |
track id | 4 | id号,不能重复且不能为0 |
reserved | 4 | 保留位 |
duration | 4 | track的时间长度 |
reserved | 8 | 保留位 |
layer | 2 | 视频层,默认为0,值小的在上层 |
alternate group | 2 | track分组信息,默认为0表示该track未与其他track有群组关系 |
volume | 2 | [8.8] 格式,如果为音频track,1.0(0x0100)表示最大音量;否则为0 |
reserved | 2 | 保留位 |
matrix | 36 | 视频变换矩阵 |
width | 4 | 宽 |
height | 4 | 高,均为 [16.16] 格式值,与sample描述中的实际画面大小比值,用于播放时的展示宽高 |
【3】
2.2.2 Media Box
2.2.2.1 Media Header Box
Box Type: ‘mdhd’
- aligned(8) class MediaHeaderBox extends FullBox(‘mdhd’, version, 0) {
- if (version==1) {
- unsigned int(64) creation_time;
- unsigned int(64) modification_time;
- unsigned int(32) timescale;
- unsigned int(64) duration;
- } else { // version==0
- unsigned int(32) creation_time;
- unsigned int(32) modification_time;
- unsigned int(32) timescale;
- unsigned int(32) duration;
- }
- bit(1) pad = 0;
- unsigned int(5)[3] language; // ISO-639-2/T language code
- unsigned int(16) pre_defined = 0;
- }
Type | Comment | |
box size | 4 | box大小 |
box type | 4 | box类型 |
version | 1 | box版本,0或1,一般为0。 |
creation time | 4 | 创建时间(相对于UTC时间1904-01-01零点的秒数) |
modification time | 4 | 修改时间 |
time scale | 4 | 文件媒体在1秒时间内的刻度值,可以理解为1秒长度的时间单元数 一般情况下视频的 都是90000 |
duration | 4 | 该track的时间长度 |
language | 2 | 媒体语言码 |
pre-defined | 2 |
2.2.2.2 Handler Reference Box
- aligned(8) class HandlerBox extends FullBox(‘hdlr’, version = 0, 0) {
- unsigned int(32) pre_defined = 0;
- unsigned int(32) handler_type;
- const unsigned int(32)[3] reserved = 0;
- string name;
- }
Field | Type | Comment |
box size | 4 | box大小 |
box type | 4 | box类型 |
version | 1 | box版本,0或1,一般为0。 |
flags | 3 |
|
pre-defined | 4 | |
handler_type | 4 | ‘vide’ Video track ‘soun’ Audio track ‘hint’ Hint track |
reserved | 12 | 0 |
name | string | 字符串 track type name |
例子
00 00 00 2168 64 6C 7200 00 00 00 00 00 00 00 ; ...!hdlr........
76 69 64 65 00 00 00 00 00 00 00 00 00 00 00 00 ; vide............
002.2.2.3 Media Information Box
2.2.2.3.1 Media Information Header Boxes
- aligned(8) class VideoMediaHeaderBox
- extends FullBox(‘vmhd’, version = 0, 1) {
- template unsigned int(16) graphicsmode = 0; // copy, see below
- template unsigned int(16)[3] opcolor = {0, 0, 0};
- }
Field | Type | Comment |
box size | 4 | box大小 |
box type | 4 | box类型 |
version | 1 | box版本,0或1,一般为0。 |
flags | 3 | flags |
graphicsmode | 2 | specifies a composition mode for this video track, from the following enumerated set, which may be extended by derived specifications: copy = 0 copy over the existing image |
opcolor | 2*3 | is a set of 3 colour values (red, green, blue) available for use by graphics modes |
例子:
00 00 00 14 76 6D 68 640000 00 01 00 00 00 00 ; ....vmhd........
00 00 00 00 ; ....
Sound Media Header Box(smhd)
- aligned(8) class SoundMediaHeaderBox
- extends FullBox(‘smhd’, version = 0, 0) {
- template int(16) balance = 0;
- const unsigned int(16) reserved = 0;
- }
Field | Type | Comment |
box size | 4 | box大小 |
box type | 4 | box类型 |
version | 1 | box版本,0或1,一般为0。 |
flags | 3 | flags |
balance | 2 | 立体声平衡,[8.8]格式值,一般为0,-1.0表示全部左声道,1.0表示全部右声道 |
reserved | 2 | 0 |
例子:
00 00 00 10 73 6D 68 64 00 00 00 0000 0000 00 ; ....smhd........
Hint Media Header Box(hmhd)
- aligned(8) class HintMediaHeaderBox
- extends FullBox(‘hmhd’, version = 0, 0) {
- unsigned int(16) maxPDUsize;
- unsigned int(16) avgPDUsize;
- unsigned int(32) maxbitrate;
- unsigned int(32) avgbitrate;
- unsigned int(32) reserved = 0;
- }
2.2.2.3.2 Data Information Box
- aligned(8) class DataEntryUrlBox (bit(24) flags)
- extends FullBox(‘url ’, version = 0, flags) {
- string location;
- }
- aligned(8) class DataEntryUrnBox (bit(24) flags)
- extends FullBox(‘urn ’, version = 0, flags) {
- string name;
- string location;
- }
- aligned(8) class DataReferenceBox
- extends FullBox(‘dref’, version = 0, 0) {
- unsigned int(32) entry_count;
- for (i=1; i • entry_count; i++) {
- DataEntryBox(entry_version, entry_flags) data_entry;
- }
- }
前面介绍过的几种格式flv、mkv、asf等。他们音视频的数据包一般都是按照文件的顺序交叉安放。你解析完头部信息后。剩下的一般就按照文件顺序一个数据包一个数据包的解析就行了。但是MP4完全不是这种概念。他的媒体信息和数据是分开存放的。就是你想获得数据之前必须要解析出每个帧数据所有的位置。mp4存放这个帧信息的是放在stbl这个box里。而真实的数据放在mdat中。接下来就讲讲stbl与mdat的对应关系。
Sample Table Box(stbl)
stts: Decoding Time to Sample Box 时间戳和Sample映射表
stsd: Sample Description Box
stsz, stz2: Sample Size Boxes 每个Sample大小的表。
stsc: Sample to chunk 的映射表。
‘stco’, ‘co64’: Chunk位置偏移表
stss:关键帧index。
1.解析stsd可获得coding类型、视频宽高、音频samplesize、channelcount这些和解码器有关信息。
- aligned(8) class SampleDescriptionBox (unsigned int(32) handler_type)
- extends FullBox('stsd', 0, 0){
- int i ;
- unsigned int(32) entry_count;
- for (i = 1 ; i u entry_count ; i++){
- switch (handler_type){
- case ‘soun’: // for audio tracks
- AudioSampleEntry();
- break;
- case ‘vide’: // for video tracks
- VisualSampleEntry();
- break;
- case ‘hint’: // Hint track
- HintSampleEntry();
- break;
- }
- }
- }
- aligned(8) abstract class SampleEntry (unsigned int(32) format)
- extends Box(format){
- const unsigned int(8)[6] reserved = 0;
- unsigned int(16) data_reference_index;
- }
- class HintSampleEntry() extends SampleEntry (protocol) {
- unsigned int(8) data [];
- }
- // Visual Sequences
- class VisualSampleEntry(codingname) extends SampleEntry (codingname){
- unsigned int(16) pre_defined = 0;
- const unsigned int(16) reserved = 0;
- unsigned int(32)[3] pre_defined = 0;
- unsigned int(16) width;
- unsigned int(16) height;
- template unsigned int(32) horizresolution = 0x00480000; // 72 dpi
- template unsigned int(32) vertresolution = 0x00480000; // 72 dpi
- const unsigned int(32) reserved = 0;
- template unsigned int(16) frame_count = 1;
- string[32] compressorname;
- template unsigned int(16) depth = 0x0018;
- int(16) pre_defined = -1;
- }
- // Audio Sequences
- class AudioSampleEntry(codingname) extends SampleEntry (codingname){
- const unsigned int(32)[2] reserved = 0;
- template unsigned int(16) channelcount = 2;
- template unsigned int(16) samplesize = 16;
- unsigned int(16) pre_defined = 0;
- const unsigned int(16) reserved = 0 ;
- template unsigned int(32) samplerate = {timescale of media}<<16;
- }
- aligned(8) class SampleSizeBox extends FullBox(‘stsz’, version = 0, 0) {
- unsigned int(32) sample_size;
- unsigned int(32) sample_count;
- if (sample_size==0) {
- for (i=1; i u sample_count; i++) {
- unsigned int(32) entry_size;
- }
- }
- }
- aligned(8) class TimeToSampleBox
- extends FullBox(’stts’, version = 0, 0) {
- unsigned int(32) entry_count;
- int i;
- for (i=0; i < entry_count; i++) {
- unsigned int(32) sample_count;
- unsigned int(32) sample_delta;
- }
- }
4.解析stsc 还原Sample 与chunk的映射表
Sample 是存储的最基本单元,mp4把Sample 存在chunk中。chunk的长度、chunk的大小、chunk中Sample的数量及大小都是不定的。
通过解析这部分box来还原这个映射表。
- aligned(8) class SampleToChunkBox
- extends FullBox(‘stsc’, version = 0, 0) {
- unsigned int(32) entry_count;
- for (i=1; i u entry_count; i++) {
- unsigned int(32) first_chunk;
- unsigned int(32) samples_per_chunk;
- unsigned int(32) sample_description_index;
- }
- }
每个entry 表示着一组数据,entry_count 表示这数量。这一组其实是相同类型的chunk。
first_chunk 表示 这一组相同类型的chunk中 的第一个chunk数。
这些chunk 中包含的Sample 数量,即samples_per_chunk 是一致的。
每个Sample 可以通过sample_description_index 去stsd box 找到描述信息。
看ffmpeg中mov_read_stsc() 它把这些数据放在一个结构体数组中备用。
- static int mov_read_stsc(MOVContext *c, AVIOContext *pb, MOVAtom atom)
- {
- AVStream *st;
- MOVStreamContext *sc;
- unsigned int i, entries;
- if (c->fc->nb_streams < 1)
- return 0;
- st = c->fc->streams[c->fc->nb_streams-1];
- sc = st->priv_data;
- avio_r8(pb); /* version */
- avio_rb24(pb); /* flags */
- entries = avio_rb32(pb);
- av_dlog(c->fc, "track[%i].stsc.entries = %i\n", c->fc->nb_streams-1, entries);
- if (!entries)
- return 0;
- if (entries >= UINT_MAX / sizeof(*sc->stsc_data))
- return AVERROR_INVALIDDATA;
- sc->stsc_data = av_malloc(entries * sizeof(*sc->stsc_data));
- if (!sc->stsc_data)
- return AVERROR(ENOMEM);
- for (i = 0; i < entries && !pb->eof_reached; i++) {
- sc->stsc_data[i].first = avio_rb32(pb);
- sc->stsc_data[i].count = avio_rb32(pb);
- sc->stsc_data[i].id = avio_rb32(pb);
- }
- sc->stsc_count = i;
- if (pb->eof_reached)
- return AVERROR_EOF;
- return 0;
- }
5.解析‘stco’, ‘co64’
“stco”定义了每个thunk在媒体流中的位置。位置有两种可能,32位的和64位的,后者对非常大的电影很有用。
32位
- aligned(8) class ChunkOffsetBox
- extends FullBox(‘stco’, version = 0, 0) {
- unsigned int(32) entry_count;
- for (i=1; i u entry_count; i++) {
- unsigned int(32) chunk_offset;
- }
- }
64位
- aligned(8) class ChunkLargeOffsetBox
- extends FullBox(‘co64’, version = 0, 0) {
- unsigned int(32) entry_count;
- for (i=1; i u entry_count; i++) {
- unsigned int(64) chunk_offset;
- }
- }
1.解析‘stco’, ‘co64’我们有了chunk 表,知道了chunk 的总数及每个chunk所在文件的位置。
2.解析stsc 配合着上面的chunk表,我们就能弄个Sample与chunk的关系表。我们也就能获得每个Sample的位置信息。
3.配合上面的stts 时间表和解码器信息等。搞出ES流已经不成问题了。
4.想获得关键帧的index,需要解析stss’
- aligned(8) class SyncSampleBox
- extends FullBox(‘stss’, version = 0, 0) {
- unsigned int(32) entry_count;
- int i;
- for (i=0; i < entry_count; i++) {
- unsigned int(32) sample_number;
- }
- }