ISO/IEC 14496-12翻译(ISO Base Media File Format)

介绍

The ISO Base Media File Format是被设计用来容纳一个为了便于交互,管理,展示的灵活,可扩展性的的多媒体文件的timed媒体信息。这个描述可以是本地的,或者是一个通过网络或其他传送机制的流。

这个文件结构是面向对象的,一个文件可以被非常简单地解压到多个对象中,这些对象的结构直接从他们的类型中推测。

当使能高效支持时,这些文件格式是独立于任何特殊网络协议而设计的。

The ISO Base Media File Format是一种多媒体文件的基本格式。

有一个趋势是the ISO Base Media File Format将被WG1和WG11共同维护。因此,ISO/IEC 15444‐12 and ISO/IEC 14496‐12最为子版本被创建出来,以便灵活维护。

…(略过一大堆说明)

术语,定义和缩略语

3.1术语和定义

box:被一个唯一类型id和长度定义的面向对象创建的块。

chunk: 一个track中一系列连续的samples。

container box: 是一种box,它的唯一用途就是容纳和组织一系列相关的boxes。

hint track: 一种特殊的track,它不包括媒体数据,但包含打包一个或多个tracks到一个streaming channel的指令。

hinter:将一个或多个hint tracks加入到一个只含有media文件的工具。

ISO Base Media File:符合这篇文章定义的格式的文件的名字。

media data box:保存实际媒体数据的box(‘mdat’)。

movie box: 一个box容器,它的子boxes定义了媒体数据(’moov’)。

movie-fragment relative addressing: 指示在movie fragments中的media data相对于movie fragments开始的偏移,特别地,在Track Fragment Header Boxes中将flags base‐data‐offset‐present设为0,将default‐base‐is‐moof设为1.

presentation:一个或多个动态序列,可能包含音频。

random access point (RAP):sample in a track that starts at the ISAU of a SAP of type 1 or 2 or 3 as defined in Annex I; informally, a sample, from which when decoding starts, the sample itself and all samples following in composition order can be correctly decoded。(暂时不懂…)

random access recovery point:sample in a track with presentation time equal to the TSAP of a SAP of type 4 as defined in Annex I; informally, a sample, that can be correctly decoded after having decoded a number of samples that is before this sample in decoding order, sometimes known as gradual decoding refresh(暂时不懂…)

sample: 与单一时间戳关联的所有数据。

sample description:定义和描述一些在一个track中的samples的结构体。

sample table: samples在一个track中的时间和物理布局。

sync sample:。。。

segment:一个ISO base media file format文件的一部分,由一个movie box(包括它的媒体数据和其他相关的boxes),或者由一个或多个movie fragment boxes(包括它的媒体数据和其他相关的boxes)。

subsegment:一个由多个movie fragment boxes组成的segment的时间间隔,也是一个有效的segment。

track:一个ISO base media file的时间相关的samples序列。

对象-结构文件的组织

文件结构

文件有一系列对象组成,在本文中成为boxes。所有的数据都保存在boxes中,除此之外文件中没有别的数据。这包括任何定义这种格式所需的初始签名。

所有遵循本文的结构化文件都包含一个File Type Box。

对象结构

一个对象在术语上就是一个box。

Boxes以一个header开始,这个头包含了size和type。这个header允许紧凑或扩展的size(32 or 64 bits)和紧凑或扩展的type(32 bits or full Universal Unique IDentifiers, i.e. UUIDs).所有标准的boxes都是用紧凑types(32‐bit),大部分boxes将使用紧凑的size(32‐bit)。典型的只有Media Data Box(es)需要64‐bit size.

这个size是这个box的总大小,包括size和type头,fields和所有它包含的boxes。这个方便解析文件。

这些boxes被MPEG‐4中用syntax description language (SDL)定义(参见条款2).在下面的代码块中描述了这些信息。

objects中最重要的的域放在字节最前面,一般以网络字节序或大端格式。当域小于一个字节或跨越一个字节的边界时,这些字节是从最有意义的bits开始写到最小意义的bits。例如,一个有两个bits的域跟随这一个有6个bits的域,这个字节就有两个bits在高位。

aligned(8) class Box (unsigned int(32) boxtype,
optional unsigned int(8)[16] extended_type) {
unsigned int(32) size;
unsigned int(32) type = boxtype;
if (size==1) {
unsigned int(64) largesize;
} else if (size==0) {
// box extends to end of file
}
if (boxtype==‘uuid’) {
unsigned int(8)[16] usertype = extended_type;
} }

size是一个定义一个box的字节数的整型,包括这个box的所有域及子box,如果size是1,那实际的size是largesize,如果size是0,那这个box则是文件中最后一个box,它的内容延伸到文件的最尾(normally only used for a Media Data Box)

type定义了box的类型,标准的boxes使用一个紧凑型的type,它一般使用4个可打印的字符,为了简化定义,在下面的boxes中将展现出来。用户扩展使用一个扩展的type,在这种情况下,这个type域将设为‘uuid’。

不能识别类型的boxes将被忽略或跳过。

许多objects也包含一个version和一个flags域:

aligned(8) class FullBox(unsigned int(32) boxtype, unsigned int(8) v, bit(24) f)
extends Box(boxtype) {
unsigned int(8) version = v;
bit(24) flags = f;
}
The semantics of these two fields are:
version is an integer that specifies the version of this format of the box.
flags is a map of flags
Boxes with an unrecognized version shall be ignored and skipped.

4.3 File Type Box
4.3.1定义

Box Type: `ftyp’
Container: File
Mandatory: Yes
Quantity: Exactly one (but see below)

符合本文中定义版本的文件必须包含一个file‐type box。为了兼容更早版本的定义,文件可能符合本文定义,但没有包含一个file‐type box。没有file‐type box的文件将是可读的,如果是它有一个FTYP box,且Major_brand=’mp41’, minor_version=0, and the single compatible brand ‘mp41’.

A media‐file structured to this part of this specification may be compatible with more than one detailed specification, and it is therefore not always possible to speak of a single ‘type’ or ‘brand’ for the file. This means that the utility of the file name extension and Multipurpose Internet Mail Extension (MIME) type are somewhat reduced.(不太懂)

这个box必须尽可能放在文件的最前面(在任何必须的签名后面,但在所有有效的boxes前面)。它能识别这个文件‘最有用的’说明书,和这个说明书的次要版本(minor_version),和这个文件遵从的一系列其他说明书。Readers implementing this format should attempt to read files that are marked as compatible with any of the specifications that the reader implements. Any incompatible change in a specification should therefore register a new ‘brand’ identifier to identify files conformant to the new specification.

次要版本只是提供信息。它不代表兼容品牌,并且不能用来决定一个文件是否是标准的。在检索,调试或解码中,它需要更精确的主要的说明书来识别。

文件通常需要更进一步的识别(如通过文件后缀或mime类型)来确定‘best use’,或者官方信任的brand会提供最好的兼容性。

This section of this specification does not define any brands. However, see subclause 6.3 below for brands for files conformant to the whole specification and not just this section. All file format brands defined in this specification are included in Annex E with a summary of which features they require.

4.3.2语法

aligned(8) class FileTypeBox
extends Box(‘ftyp’) {
unsigned int(32) major_brand;
unsigned int(32) minor_version;
unsigned int(32) compatible_brands[]; // to end of the box
}

4.3.3语义

这个box确定该文件遵循的是哪个规范。

每个brand是一个4字节的可打印的字符,它在ISO中注册,确定文件所遵循的规范。
major_brand – is a brand identifier
minor_version – is an informative integer for the minor version of the major brand compatible_brands – is a list, to the end of the box, of brands

5 Design Considerations(设计的注意事项)

5.1 Usage
5.1.1 Introduction

The file format is intended to serve as a basis for a number of operations. In these various roles, it may be used in different ways, and different aspects of the overall design exercised.

5.1.2 Interchange

当它被用来当做一个交互格式,这些文件通常是自包含(self‐contained)的(不会关联到别的文件),只包含用于展示的媒体数据,不包含任何与这个流相关的信息。这将产生一个小的,独立于协议的,自包含的文件,它包含核心的媒体数据。

The following diagram gives an example of a simple interchange file, containing two streams.这里写图片描述

内容生成(Content Creation)

在内容生成的过程中,格式中的多个区域可以配置来实现更实用的效果,特别是:
1.它能够分开保存每个单独的流(不交错),在不同的文件中;
2.它能够在一个保存有媒体数据和其他流的文件中展现一个单一的流(例如,编译一个未压缩的音频track, 使它对齐一个已经准备好的video track)。

这些特性意味着,展示的内容不需要重写到硬盘上,就可以实现内容的准备,编辑应用,开发和整合。这在一些场合,如需要在交错的媒体数据中删除不用的数据,或者当已编码的数据在不解码和重编码的情况下,是很必要的。(不是很顺…)

In the following diagram, a set of files being used in the process of content creation is shown.
这里写图片描述

5.1.4 Preparation for streaming

当准备流时,文件中必须包含要发送给流服务器的信息。另外,这些指令和媒体数据最好是交错的,这样避免频繁地seeking。保证媒体数据不被损坏也是非常重要的,因此文件应该要被校验,或能够重新编辑,或能重新使用。最后,如果一个单一的文件能够适应多个协议,那么就可以使用不同协议的服务器。

5.1.5 Local presentation

‘本地’观看一个presentation(例如,直接从一个文件,不通过流媒体连接)是一个重要的应用,it is used when a presentation is distributed (e.g. on CD or DVD ROM), during the process of development, and when verifying the content on streaming servers.这种本地观看必须被支持,且能够随机访问。如果一个presentation是在CD or DVD ROM,当交错存储时,seeking可能很慢。

5.1.6 Streamed presentation

当一个服务器从一个文件中制作一个流,这个流必须遵循所使用的协议,并且不能包含有关于这个文件格式的信息。这个服务器需要能够随意的访问这个presentation。通过从多个presentations中引用相同的媒体数据来重用服务器内容,这对于一些只读媒体(如CD)和不可复制或很少扩展的流有帮助。

The following diagram shows a presentation prepared for streaming over a multiplexing protocol, only one hint track is required.
这里写图片描述

5.2 Design principles

文件结构是面向对象的,一个文件可以很容易被解压到一个个的objects,且这些objects的结构取决于他们的类型。

Media‐data is not ‘framed’ by the file format; 文件格式声明给出media data单元的size, type和position,他们在物理上并不是连续的。这让从media data中抽出子集成为可能,且可以在自然状态下使用它,而不需要拷贝出来,用另一块空间做成帧。metadata是通过引用来描述media data,而不是包含media data。

类似的一个特定流协议的协议信息也不是以帧的形式存在media data中的。协议的headers在物理上不是连续的,Instead, the media data can be included by reference. This makes it possible to represent media data in its natural state, not favouring any protocol. It also makes it possible for the same set of media data to serve for local presentation, and for multiple protocols。

The protocol information is built in such a way that the streaming servers need to know only about the protocol and the way it should be sent; the protocol information abstracts knowledge of the media so that the servers are, to a large extent, media‐type agnostic. Similarly the media‐data, stored as it is in a protocol‐unaware fashion, enables the media tools to be protocol‐agnostic.
The file format does not require that a single presentation be in a single file. This enables both sub‐ setting and re‐use of content. When combined with the non‐framing approach, it also makes it possible to include media data in files not formatted to this specification (e.g. ‘raw’ files containing only media data and no declarative information, or file formats already in use in the media or computer industries).
The file format is based on a common set of designs and a rich set of possible structures and usages. The same format serves all usages; translation is not required. However, when used in a particular way (e.g. for local presentation), the file may need structuring in certain ways for optimal behaviour (e.g. time‐ ordering of the data). No normative structuring rules are defined by this specification, unless a restricted profile is used.

6 ISO Base Media File organization

6.1 Presentation structure
6.1.1 File Structure

一个presentation可能被包含在多个文件中。一个文件包含所有presentation的metadata,且遵循本文的格式。这个文件也可能保存所有的media data, 于是这个presentation是self‐contained的。另外的文件,如果有的话,不需要遵循本文,他们用来保存media data,也可能包含不用的media data,或其他信息。本文只关心presentation文件的结构,而media data文件的格式只有在这个media‐data必须胜任这里定义的metadata的描述时,才会受本文的约束。

这些其他文件可能是ISO files, image files, or other formats.只有media data自己,如JPEG 2000 images,是保存在这些其他文件中的;所有timing and framing (position and size)信息都是 ISO base media file,辅助文件是随意格式的。

If an ISO file contains hint tracks, the media tracks that reference the media data from which the hints were built shall remain in the file, even if the data within them is not directly referenced by the hint tracks; after deleting all hint tracks, the entire un‐hinted presentation shall remain. Note that the media tracks may, however, refer to external files for their media data.

6.1.2 Object Structure

文件是一系列对象组成的,一些对象可能包含另一些对象。这些对象包含一个presentation metadata的封装(the Movie Box)。它通常靠近文件的开头或结尾,让定位更加容易。其他这个层次的对象可能是a File‐Type box, Free Space Boxes, Movie Fragments, Meta‐data, or Media Data Boxes.

6.1.3 Meta Data and Media Data

metadata包含在metadata wrapper (the Movie Box)中。the media data包含在一个或多个文件的Media Data Box(es)中。media data是由图像或音频数据组成,the media data objects, or media data files, may contain other un‐referenced information.

6.1.4 Track Identifiers

在一个ISO file中的Track标识符是唯一的,没有两个tracks有相同的标识符。

下一个track标识符保存在Movie Header Box的next_track_ID中,它通常大于当前在文件中已找到的其他track的值。这在大部分情况下更容易生成track标识符。然而,不过这个值等于某个其他的值 (32‐bit unsigned maxint),就需要额外搜索一个没用过track标识符。

6.2 Metadata Structure (Objects)
6.2.1 Box

没有定义的type域将被保留。私人扩展可以通过‘uuid’来实现。另外,下面这些type现在和将来都不会被用到,它只出现在已经出现的场景,in future versions of this specification, to avoid conflict with existing content using earlier pre‐standard versions of this format:
clip, crgn, matt, kmat, pnot, ctab, load, imap;
these track reference types (as found in the reference_type of a Track Reference Box): tmcd, chap, sync, scpt, ssrc.
A number of boxes contain index values into sequences in other boxes. These indexes start with the value 1 (1 is the first entry in the sequence).

6.2.2 Data Types and fields

In a number of boxes in this specification, there are two variant forms: version 0 using 32‐bit fields, and version 1 using 64‐bit sizes for those same fields. In general, if a version 0 box (32‐bit field sizes) can be used, it should be; version 1 boxes should be used only when the 64‐bit field sizes they permit, are required. Values for counters, offsets, times, durations etc. in this format do not ‘wrap’ to 0 when the maximum value that can be stored in their field is reached; appropriately large fields must be used for all values.

为了方便在生成内容的时候把创建时间和修改时间也包含在文件中。These can be 32‐bit or 64‐bit numbers, counting seconds since midnight, Jan. 1, 1904, which is a convenient date for leap‐year calculations. 32 bits are sufficient until approximately year 2040. These times shall be expressed in Universal Time Coordinated (UTC), and therefore may need adjustment to local time if displayed.

Fixed‐point numbers are signed or unsigned values resulting from dividing an integer by an appropriate power of 2. For example, a 30.2 fixed‐point number is formed by dividing a 32‐bit integer by 4.

Fields shown as “template” in the box descriptions are optional in the specifications that use this specification. If the field is used in another specification, that use must be conformant with its definition here, and the specification must define whether the use is optional or mandatory. Similarly, fields marked “pre‐defined” were used in an earlier version of this specification. For both kinds of fields, if a field of that kind is not used in a specification, then it should be set to the indicated default value. If the field is not used it must be copied un‐inspected when boxes are copied, and ignored on reading.

在headers中的矩阵值定义了一个video图像在显示时需要的转换。不是所有的specifications都用矩,如果不用,他们应该设置成一致性的矩阵。If a matrix is used, the point (p,q) is transformed into (p’, q’) using the matrix as follows:
(pq1)* |a b u | =(mnz) |cdv|
|xyw|
m=ap+cq+x; n=bp+dq+y; z=up+vq+w; p’=m/z; q’=n/z

坐标{p,q}对应解码出来的帧,{p’, q’}对应渲染到画面的帧。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值