音视频的时长怎么获取,音视频的封面怎么获取,音视频的格式怎么获取呢?这些信息都以特定格式存储在文件开头或者结尾,称为多媒体信息或者多媒体元数据。通用的封装格式由:文件标识头+多媒体信息+音视频(字幕)轨+视频帧索引块组成,如果是纯音频,后面可能还有歌词。音视频的封装格式就是通过解析文件标识头进行判断的,然后解析多媒体信息从而获取时长,再解析视频帧索引块,最后根据索引块去获取对应时间戳的视频帧。
音视频封装格式存储的字段包括:时长、码率、音视频编码器、分辨率(宽x高)、帧率、像素格式、旋转角度、采样率、声道数等等。其中视频专有的字段是分辨率、帧率、像素格式、旋转角度,而音频专有的字段是采样率、声道数。具体的字段解析可以参考:音视频的基本概念。
常见的视频格式有:mp4、mov、3gp、mkv、webm、flv、avi、mpg、wmv、ts等等。其中mp4、mov、3gp同属一个协议簇,目前mp4最为流行,mp4全称为MPEG-4,由国际标准化组织和国际电工委员会下属的动态图像专家组(Moving Picture Experts Group)制定,具体协议可参考:ISO/IEC14496-14协议;mkv与webm公用封装格式:matroska,对于高清视频而言,mkv/webm最受欢迎;而avi是比较古老的格式,音视频流交错(Audio Video Interleave),可以封装各种编码格式的音视频流;mpg属于ps的一种封装格式;wmv(Windows Media Video)是微软推出的视频编解码格式统称,采用ASF(Advance System Format)作为容器,基于Object对象进行封装;而ts的全称为MPEG2-TS,即为Transport Stream的缩写,具体可参考ISO/IEC13818-1协议,作用于传输层,主要用于实时传输的节目,HLS直播协议就是基于ts切片来传输视频流的,主要特点是从视频流任一片段都可独立解码播放;ps与ts类似,全称为MPEG-PS,即为Program Stream的缩写,用于存储固定时长的节目。视频格式如下图所示:
整个解封装流程:从读取文件头判断视频格式开始,然后选择对应的Extractor,解析多媒体信息,再解析视频帧的索引块,最后根据索引去定位并读取音视频数据。如下图所示:
图1—视频解封装流程图
mp4作为目前最流行的视频封装格式,也是本篇文章的男一号主角,下面将围绕mp4格式进行展开分析。mp4是由一系列的box组成(在quick time协议中,称为atom),box又由Header和Data组成,box的结构如图2所示:
图2—box的宏观结构
而Header由size、type、largeSize、extendType组成,其中size和type是必要字段,如表1所示:
size | type | largeSize | extendType |
4 bytes | 4 bytes | 8 bytes | 16 bytes |
表1—通用Header结构
full box的Header多两个字段:version、flag,一般是track box采用full box形式,如表2所示:
size | type | largeSize | extendType | version | flag |
4 bytes | 4 bytes | 8 bytes | 16 bytes | 1 byte | 3 bytes |
表2—full box的Header结构
box分为normal box、full box、large box、extend box。如果size为1,那么表明该box为large box,使用largeSize来存储box的大小;如果size为0,那么表明该box是文件的最后一个box;如果box的类型为uuid,那么表明该box是扩展box。如下面代码段所示:
aligned(8) class Box (unsigned int(32) boxtype,
optional unsigned int(8)[16] extended_type) {
unsigned int(32) size;
unsigned int(32) type = boxtype;
if (size==1) {
unsigned int(64) largesize;
} else if (size==0) {
// box extends to end of file
}
if (boxtype==‘uuid’) {
unsigned int(8)[16] usertype = extended_type;
}
}
moov box作为mp4格式的重要组成部分,根据moov box与mdat box的相对位置,分为moov前置和moov后置。如下面图3、图4所示:
图3—mp4结构图(moov前置)
图4—mp4结构图(moov后置)
通常情况下,mp4的moov都是在mdat前面的;一般只有实时录制的mp4视频,moov才在mdat的后面。除了moov box,还有ftyp box、moof box、mdat box、free box、meta box等等,如下表所示,对各种box的介绍与描述:
ftyp |
|
|
|
|
| √ | file type and compatibility |
pdin |
|
|
|
|
|
| progressive download information |
moov |
|
|
|
|
| √ | container for all the metadata |
| mvhd |
|
|
|
| √ | movie header, overall declarations |
| trak |
|
|
|
| √ | container for an individual track or stream |
|
| tkhd |
|
|
| √ | track header, overall information about the track |
|
| tref |
|
|
|
| track reference container |
|
| edts |
|
|
|
| edit list container |
|
|
| elst |
|
|
| an edit list |
|
| mdia |
|
|
| √ | container for the media information in a track |
|
|
| mdhd |
|
| √ | media header, overall information about the media |
|
|
| hdlr |
|
| √ | handler, declares the media (handler) type |
|
|
| minf |
|
| √ | media information container |
|
|
|
| vmhd |
|
| video media header, overall information (video track only) |
|
|
|
| smhd |
|
| sound media header, overall information (sound track only) |
|
|
|
| hmhd |
|
| hint media header, overall information (hint track only) |
|
|
|
| nmhd |
|
| Null media header, overall information (some tracks only) |
|
|
|
| dinf |
| √ | data information box, container |
|
|
|
|
| dref | √ | data reference box, declares source(s) of media data in track |
|
|
|
| stbl |
| √ | sample table box, container for the time/space map |
|
|
|
|
| stsd | √ | sample descriptions (codec types, initialization etc.) |
|
|
|
|
| stts | √ | (decoding) time-to-sample |
|
|
|
|
| ctts |
| (composition) time to sample |
|
|
|
|
| stsc | √ | sample-to-chunk, partial data-offset information |
|
|
|
|
| stsz |
| sample sizes (framing) |
|
|
|
|
| stz2 |
| compact sample sizes (framing) |
|
|
|
|
| stco | √ | chunk offset, partial data-offset information |
|
|
|
|
| co64 |
| 64-bit chunk offset |
|
|
|
|
| stss |
| sync sample table (random access points) |
|
|
|
|
| stsh |
| shadow sync sample table |
|
|
|
|
| padb |
| sample padding bits |
|
|
|
|
| stdp |
| sample degradation priority |
|
|
|
|
| sdtp |
| independent and disposable samples |
|
|
|
|
| sbgp |
| sample-to-group |
|
|
|
|
| sgpd |
| sample group description |
|
|
|
|
| subs |
| sub-sample information |
| mvex |
|
|
|
|
| movie extends box |
|
| mehd |
|
|
|
| movie extends header box |
|
| trex |
|
|
| √ | track extends defaults |
| ipmc |
|
|
|
|
| IPMP Control Box |
moof |
|
|
|
|
|
| movie fragment |
| mfhd |
|
|
|
| √ | movie fragment header |
| traf |
|
|
|
|
| track fragment |
|
| tfhd |
|
|
| √ | track fragment header |
|
| trun |
|
|
|
| track fragment run |
|
| sdtp |
|
|
|
| independent and disposable samples |
|
| sbgp |
|
|
|
| sample-to-group |
|
| subs |
|
|
|
| sub-sample information |
mfra |
|
|
|
|
|
| movie fragment random access |
| tfra |
|
|
|
|
| track fragment random access |
| mfro |
|
|
|
| √ | movie fragment random access offset |
mdat |
|
|
|
|
|
| media data container |
free |
|
|
|
|
|
| free space |
skip |
|
|
|
|
|
| free space |
| udta |
|
|
|
|
| user-data |
|
| cprt |
|
|
|
| copyright etc. |
meta |
|
|
|
|
|
| metadata |
| hdlr |
|
|
|
| √ | handler, declares the metadata (handler) type |
| dinf |
|
|
|
|
| data information box, container |
|
| dref |
|
|
|
| data reference box, declares source(s) of metadata items |
| ipmc |
|
|
|
|
| IPMP Control Box |
| iloc |
|
|
|
|
| item location |
| ipro |
|
|
|
|
| item protection |
|
| sinf |
|
|
|
| protection scheme information box |
|
|
| frma |
|
|
| original format box |
|
|
| imif |
|
|
| IPMP Information box |
|
|
| schm |
|
|
| scheme type box |
|
|
| schi |
|
|
| scheme information box |
| iinf |
|
|
|
|
| item information |
| xml |
|
|
|
|
| XML container |
| bxml |
|
|
|
|
| binary XML container |
| pitm |
|
|
|
|
| primary item reference |
| fiin |
|
|
|
|
| file delivery item information |
|
| paen |
|
|
|
| partition entry |
|
|
| fpar |
|
|
| file partition |
|
|
| fecr |
|
|
| FEC reservoir |
|
| segr |
|
|
|
| file delivery session group |
|
| gitn |
|
|
|
| group id to name |
|
| tsel |
|
|
|
| track selection |
meco |
|
|
|
|
|
| additional metadata container |
| mere |
|
|
|
|
| metabox relation |
表3—mp4的各种box描述
ftyp box:mp4视频标识头,包含major brand、minor version、compatible brands。其中major brand一般为isom,而compatible brands包括isom、iso2、avc、mp41、mp42等。
moov box:存储多媒体信息,嵌套着movie box(mvhd)、track box(trak box)、usedata box(udat);而track box分为视频轨、音频轨、字幕轨,如果有多语言,就会对应有多音轨;trak/mdia/minf/stbl/stsd存储的是音视频编码器信息,比如视频轨的是avc,音频轨是mp4a;trak/mdia/minf/stbl/stsz存储的是视频帧size;trak/mdia/minf/stbl/stco存储的是chunk offset。
mdat box:音视频数据,根据moov及其嵌套box解析出来的视频帧索引,去定位关键帧,然后根据帧类型读取音视频数据。
音视频学习和音视频处理项目可参考:https://github.com/xufuji456/FFmpegAndroid