为了研究如何把H264打包成fMp4文件,今天先开始研究如何从fMp4中解析出H264数据。
fMP4包括ftype+moov+(moof+mdat)*N这种格式组成。
Box的定义:
ftyp格式:
上码流:
size:00 00 00 1C ===>28
type: 66 74 79 70 =====>ftyp
major_brand:6D 70 34 32 ===>mp42
minor_version :00 00 00 01 ===>1
compatible_brands: 6D 70 34 32===》mp42 || next 61 76 63 31 ===>avc1 || next 69 73 6F 35 ==>iso5
****************************************以下为moov的box介绍***********************************
moov格式:
moov格式十分复杂,包含了大量的媒体信息,
先研究第一级目录:
moov开始就8个字节
size:00 00 07 57 ===>1879
type: 6D 6F 6F 76 =====>moov
紧接着mvhd定义如下:
size:00 00 00 6C ===>108
type: 6D 76 68 64 =====>mvhd
version:00 00 00 00====》0
creation_time: 00 00 00 00
modification_time:00 00 00 00
timescale:00 00 03 E8 ===》1000
duration:00 00 EA BF ===》60095
rate:00 01 00 00
volume:01 00
reserved16:00 00
reserved32:00 00 00 00 00 00 00 00
matrix:00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00
00 00 00 00 00 00 00 00 40 00 00 00
pre_defined: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
next_track_ID: FF FF FF FF
接着trak格式定义如下:
trak如下:
size:00 00 01 A4 ===>420
type: 74 72 61 6B =====>trak
tkhd定义如下:
上码流
size:00 00 00 5C ===>92
type: 74 6B 68 64 =====>tkhd
version:00 ==》0
flags:00 00 07 =》7
creation_time:00 00 00 00 =>0
modification_time: 00 00 00 00 =>0
track_ID: 00 00 00 01 ===>1
reserved0 : 00 00 00 00
duration : 00 00 EA BF ===>60095
reserved1 : 00 00 00 00 00 00 00 00
layer: 00 00
alternate_group: 00 00
volume : 00 01
reserved2 : 00 00
matrix: 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00
04 00 00 00
width: 00 00 00 00
height: 00 00 00 00
mdia如下
size:00 00 01 40 ===>320
type: 6D 64 69 61 =====>mdia
mdhd定义如下:
上码流:
size:00 00 00 20 ===>32
type: 6D 64 68 64 =====>mdhd
version: 00 00 00 00 ====>0
creation_time:====>00 00 00 00
modification_time:====>00 00 00 00
timescale:====>00 00 56 22 ===>22050
duration: 00 00 00 00 ====>0
pad: 0
language: 00101 01110 00111 ===>eng
pre_defined ===>0
hdlr 定义如下:
上码流:
size:00 00 00 35 ===>53
type: 68 64 6C 72 =====>hdlr
version: 00 00 00 00
pre_defined:00 00 00 00
handler_type:73 6F 75 6E ====>soun
reserved:0
name: Bento4 Sound Handler
minf 定义如下:
size:00 00 00 E3 ===>227
type: 6D 69 6E 66 =====>minf
smhd 定义如下:
上码流:
size:00 00 00 10 ===>16
type: 73 6D 68 64 =====>smhd
version: 0
balance:0
reserved:0
dinf 定义如下:
size:00 00 00 24 ===>36
type: 64 69 6E 66 =====>dinf
dref定义如下:
上码流:
size:00 00 00 1C ===>28
type: 64 72 65 66 =====>dref
version:00 00 00 00
entry_count:00 00 00 01
url 定义如下:
上码流:
size:00 00 00 0C ===>12
version: 00 00 00 01
location:same file
stbl 定义如下:
![](https://i-blog.csdnimg.cn/blog_migrate/0be53600e43de1234e78853cf30dfbf1.png)
size:00 00 00 A7 ===>12
version: 73 74 62 6C====>stbl
stsd 定义如下:
上码流:
size:00 00 00 5B ===>91
type: 73 74 73 64====>stsd
version:00 00 00 00
entry_count: 00 00 00 01
mp4a定义如下:
上码流:
size:00 00 00 4B ===>78
type: 6D 70 34 61====>mp4a
reserved[6] :00 00 00 00 00 00
data_reference_index: 00 01
reserved1: 00 00 00 00 00 00 00 00
channelcount: 00 02
samplesize: 16
pre_defined: 00 00
reserved2: 00 00
samplerate: 22050
没有找到定义,不知道哪位高手知道在哪里。直接上码流:
工具解析截图如下:
参考这里https://stackoverflow.com/questions/3987850/mp4-atom-how-to-discriminate-the-audio-codec-is-it-aac-or-mp3
the value of the 11th Byte
- 0x40 - MPEG-4 Audio
- 0x6B - MPEG-1 Audio (MPEG-1 Layers 1, 2, and 3)
- 0x69 - MPEG-2 Backward Compatible Audio (MPEG-2 Layers 1, 2, and 3)
- 0x67 - MPEG-2 AAC LC
找了一圈,找到一个解释,但是还是没搞清楚出处。(转自https://blog.csdn.net/evsqiezi/article/details/73920290)
ESDs中可以分为三层,每层为包含关系,分别为MP4ESDescr(0x03开始,一般7个字节),MP4DecConfigDescr(0x04开始,一般13个字节),MP4DecSpecificDescr
esds box分析例子:
这是一段ESDS数据00001e7: 0000 0027 6573 6473 0000 0000 0319 0000 ...'esds........
00001f7: 0004 1140 1500 01f8 0001 2728 0000 f3e8 ...@......'(....
0000207: 0502 1388 0601 02 .......
分析如下:
0000 0027: :esds box长度, 长度是39
6373 6473: :esds box type: esds
00 :Version为0
00 0000: :Flags为0
03 :ES_DescrTag 见14496-1 Table 1
19 :Length Field:25
0000: :ES_ID: 是0
00 :00(hex) =
:0000 0000(bits)
:0 :steamDependenceFlag,如果为1,则有16bits的dependsOn_ES_IS
: 0 :URL_Flag,如果为1,后边则有8bits URLlength, 和相应的URLstring(URLlength)
: 0 :OCRstreamFlag, 如果为1,有16bits OCR_ES_id;
: 0 0000 :streamPriority
04 :DecoderConfigDescriptor TAG
11 :Length Field:17
40: :objectTypeIndication 14496-1 Table8, 0x40是Audio ISO/IEC 14496-3
15 :15(hex) =
:0001 0101
:0001 01 :streamType 5是Audio Stream, 14496-1 Table9
: 0 :upStream
: 1 :reserved
00 01f8: :bufferSizeDB 504
0001 2728: :maxBitrate 75560 // 可以获取最大码率
0000 f3e8: :avgBitrate 62440 // 可以获取平均码率
05 :DecSpecificInfotag
02 :Length Field:2
1388 :14496-3 1.6
:1388(hex)=
:0001 0011 1000 1000(bit)
:0001 0 :audioObjectType 2 GASpecificConfig
: 011 1 :samplingFrequencyIndex
: 000 1 :channelConfiguration 1
: 00 :cpConfig
: 0 :directMapping
06 :SLConfigDescrTag
01 :Length Field:1
02 :predefined 0x02 Reserved for use in MP4 files
stsz定义如下:
上码流:
size:00 00 00 14 ===>20
type: 73 74 73 7A====>stsz
version: 00
flags:00 00 00 00
sample_size:00 00 00 00
sample_count: 00 00 00 00
stsc定义如下:
上码流:
size:00 00 00 10 ===>16
type: 73 74 73 63====>stsc
version: 00
flags:00 00 00 00
entry_count: 00 00 00 00
上码流:
size:00 00 00 10 ===>16
type: 73 74 74 73====>stts
version: 00
flags:00 00 00 00
entry_count: 00 00 00 00
stco定义如下:
上码流:
size:00 00 00 10 ===>16
type: 73 74 63 6F====>stco
version: 00
flags:00 00 00 00
entry_count: 00 00 00 00
上码流:
size:00 00 00 14 ===>20
type: 76 6D 68 64====>vmhd
version: 00
flags:00 00 00 00
graghicsmode: 00 00
opcolor: 00 00 00 00 00 00
avc1定义如下:
上码流:
size:00 00 00 82===>130
type: 61 76 63 31====>avc1
reserved: 00 00 00 00 00 00
data_reference_index : 00 01
pre_defined0: 00 00
reserved1: 00 00
pre_defined1: 00 00 00 00 00 00 00 00 00 00 00 00
width: 02 80 ====>640
height: 01 68 ====>360
horizresolution: 00 48 00 00
vertresolution: 00 48 00 00
reserved2: 00 00 00 00
frame_count : 00 01
compressorname: 00 00 00 00------32 --- 00 00
depth: 00 18
pre_defined2: FF FF
avcc无法找到说明
参考如下:(博客https://blog.csdn.net/stn_lcd/article/details/74520031)
bits
8 version ( always 0x01 )
8 avc profile ( sps[0][1] )
8 avc compatibility ( sps[0][2] )
8 avc level ( sps[0][3] )
6 reserved ( all bits on )
2 NALULengthSizeMinusOne // 这个值是(前缀长度-1),值如果是3,那前缀就是4,因为4-1=3
3 reserved ( all bits on )
5 number of SPS NALUs (usually 1)
repeated once per SPS:
16 SPS size
variable SPS NALU data
8 number of PPS NALUs (usually 1)
repeated once per PPS
16 PPS size
variable PPS NALU data
version: 01
sps[0][1]:0x42
sps[0][2]:0xE0
sps[0][3]:0x1E
reserved: 111111
nalulength:11
reserved: 000
nums of sps nulus:00001
sps size: 0x00 0x15 ===>21
sps: 0x27 0x42 0xE0 0x1E 0xA9 0x18 0x14 0x05 0xFF 0x2E 0x00 0xD4 0x18 0x04 0x1A 0xDB 0x0A 0xD7 0xBD 0xF0 0x10
nums of pps nulus: 01
pps size: 00 04 ==>4
pps: 28 DE 09 C0
****************************************以上为moov的box介绍***********************************
*******************************下面开始moof***********************************************
moof定义:
上码流:
size:00 00 02 60===>608
type: 6D 6F 6F 66====>moof
mfhd定义:
上码流:
size:00 00 02 60===>608
type: 6D 6F 6F 66====>moof
version: 00
flags:00 00 00 00
sequence_number: 00 00 00 01 ====>1
traf定义:
上码流:
size:00 00 02 08===>520
type: 74 72 61 66====>traf
tfhd定义:
上码流:
![](https://i-blog.csdnimg.cn/blog_migrate/d123d1e2b5268f251e3b8b4b3af07339.png)
size:00 00 00 10===>16
type: 74 66 68 64====>tfhd
version: 00
flags:00 00 00 00
track_id: 00 00 00 01 ===>1
tfdt定义:
![](https://i-blog.csdnimg.cn/blog_migrate/f42375a5dff9f63e9d1ec8c7675bc112.png)
上码流:
size:00 00 00 14===>20
type: 74 66 64 74====>tfdt
version: 01
flags:00 00 00
baseMediaDecoderTime: 00 00 00 00 00 00 00 00
这个时间特别重要,浏览器就是根据这个时间来解码的,另外我算了一下,如果baseMediaDecoderTime为4个字节,最大可以播放13个小时,所以,后来我还是改成了8个字节,但是要注意version需要修改为01
trun定义:
上码流:
size:00 00 02 18===>536
type: 74 72 75 6E====>trun
version: 00
flags:00 03 05
sample_count:00 00 00 40 ====>64
data_offset: 00 00 02 68 ====>616 ==moof总长度+8
first_sample_flags:02 00 00 00===>33554532
sample_duration: 00 00 00 19 ===>25
sample_size: 00 00 79 8B ===> 31115
sample_flags:
sample_composition_time_offset:
*******************************下面开始mdat***********************************************
mdat比较简单