ASF文件格式分析

 微软的ASF文档其实一共有两个版本。1.0和2.0。此文是asf的文件格式分析1.0版本
在微软官方网站上公布的ASF版本是2.0,但是可惜这个版本几乎没人采用。
关于微软提供的ASF文档2.0:
http://www.microsoft.com/windows/windowsmedia/forpros/format/asfspec.aspx

你们现在在网上可以下载的所有.asf .asx .wmv 和 .wma文件几乎都是使用了从未被微软公布过的1.0版

本。

而这个版本的正式文档从来就没有出现过,甚至有谣传说,这是微软内部机密。
而微软只为Windows和Mac实现了asf文件的播放和编码器。

所以ASF Advanced Streaming Format文档1.0版是由程序员们根据各种资料结合文件结构自己分析解析出

来的。


ASF

Background


   Advanced ( formerly Active ) Streaming Format was developed by Microsoft in 1995-1998. Its main purpose is to serve as an universal format for storing and streaming media. There are two versions of ASF. Version that is known as 2.0 is well-documented and its specifications are publicly available. Unfortunately, they are not very helpful for developers because this format is not widely used ( if used at all ).
   On the other hand, there's another version of ASF format ( 1.0 ). It is extremely popular. All files with extensions .asf, .asx, .wmv and .wma that you can find in the 'Net are stored in ASF 1.0. Microsoft never released any documentation covering this format. There's a rumour that this format is even patented! This situation similar to the one with MPEG-4 specifications: Microsoft appears to take active part in development of specifications for MPEG-4 but does not use these formats in its products, instead, it promotes their closed-source variations ( DivX ;-) and Windows Media Video ).
   As long as Microsoft does not provide implementations of ASF reader or writer for any platforms except Windows and Macintosh, it is necessary to have at least minimal specification of the format to implement tools for working with ASF 1.0 on all other platforms. This document tries to organize all available information covering the format, received from different sources.
   Readers are encouraged to get acquainted with ASF 2.0 specifications to better understand the ideas beyond the format and other features that it offers.

Disclaimer

 

This specification was created by analyzing data contained in freely-available media files. No reverse-engineering or other illegal activity took place during collection of this information. Neither author nor any contributors guarantee that any bit of this information is correct.

Data types

 

UINT8, UINT16, UINT32, UINT64 - unsigned integer values, 8, 16, 32 or 64-bit long. In GNU C compiler they are represented by types 'unsigned char', 'unsigned short', 'unsigned long' and 'unsigned long long'.
FILETIME - unsigned 64-bit integer. Number of 100-nanosecond intervals since midnignt, January 1, 1601, GMT.
GUID - 128-bit value, that can be generated on any system using special algorithm. The algorithm guarantees uniqueness of any such value ( it means that two different computers or even the same computer in different moments of time cannot generate the same GUIDs ).
BITMAPINFOHEADER - universal structure that describes format of a ( compressed ) image.

typedef struct

{

    long       biSize; // sizeof(BITMAPINFOHEADER)

    long       biWidth;

    long       biHeight;

    short      biPlanes; // unused

    short      biBitCount;

    long       biCompression; // fourcc of image

    long       biSizeImage;   // size of image. For uncompressed images

                              // ( biCompression 0 or 3 ) can be zero.

                             

                            

    long       biXPelsPerMeter; // unused

    long       biYPelsPerMeter; // unused

    long       biClrUsed;     // valid only for palettized images.

                              // Number of colors in palette.

    long       biClrImportant;

} BITMAPINFOHEADER;


WAVEFORMATEX - universal structure that describes format of a ( compressed ) sound stream.

typedef struct

{

  short   wFormatTag; // value that identifies compression format

  short   nChannels;

  long  nSamplesPerSec;

  long  nAvgBytesPerSec;

  short   nBlockAlign; // size of a data sample

  short   wBitsPerSample;

  short   cbSize;    // size of format-specific data

} WAVEFORMATEX;

This structure is immediately followed with an array of bytes of size cbSize.


All time intervals are either measured in 100-nanosecond steps and represented with 64-bit type ( they wrap around each several million years ), or measured in milliseconds and represented with 32-bit ( they wrap around roughly each 49.7 days ) or 16-bit types ( each 65.5 seconds ).

Basic information

 

   ASF 1.0 file consists of 'chunks'. They are similar to chunks from AVI format, but size of their fields was increased.
Chunk:

Field

Type

Size (bytes)

Chunk type

GUID

16

Chunk length

UINT64

8

Data

-

Variable

Chunk type describes type of content in the chunk. See below for list of known chunk type GUIDs.
Chunk length corresponds to the entire chunk ( i.e. length of data only is chunk length minus 24 ).
The other important concept is 'packet'. Since the format is supposed to be streamable, all actual data, such as compressed audio or video, is stored in 'packets'. Unlike in ASF 2.0, all packets have fixed size.
Each valid file should contain at least two chunks. They are File Header Chunk and Data Chunk. File Header Chunk contains all the information required to start processing actual data, while Data Chunk contains data packets.

Headers


File Header chunk:

Field

Type

Size (bytes)

Chunk type

GUID

16

Chunk length

UINT64

8

Number of subchunks

UINT32

4

Unknown

-

2

Chunks

-

Variable

This chunk is special because it contains other chunks in the data field. There may be any number of such chunks, but we need to know about two special kinds of them.

Header Object:

Field

Type

Size (bytes)

Chunk type

GUID

16

Chunk length

UINT64

8

Client GUID

GUID

16

File size

UINT64

8

File creation time

FILETIME

8

Number of packets

UINT64

8

Timestamp of the end position

UINT64

8

Duration of the playback

UINT64

8

Timestamp of the start position

UINT32

4

Unknown, maybe reserved ( usually contains 0 )

UINT32

4

Flags ( usually contains 2 )

UINT32

4

Minimum size of packet, in bytes

UINT32

4

Maximum size of packet

UINT32

4

Size of uncompressed video frame

UINT32

4

Value 0x 02 in flags probably means that the file is seekable.
Minimum & maximum sizes of packet are typically equal. It is not precisely known how to handle ASF file if it's not true.
Stream Object:

Field

Type

Size (bytes)

Chunk type

GUID

16

Chunk length

UINT64

8

Stream type (audio/video)

GUID

16

Audio error concealment type

GUID

16

Unknown, maybe reserved ( usually contains 0 )

UINT64

8

Total size of type-specific data

UINT32

4

Size of stream-specific data

UINT32

4

Stream number

UINT16

2

Unknown

UINT32

4

Type-specific

-

Variable

Stream-specific

-

Variable

Type-specific data is data which meaning can be derived only from stream type. It may be followed by fields that also depend on value of audio error concealment type.
Second unknown value in this object seems to be absolutely random, but if there is more than one stream in the file, they all hold the same value here.
Type-specific data for video stream:

Field

Type

Size (bytes)

Picture width

UINT32

4

Picture height

UINT32

4

Unknown

UINT8

1

BITMAPINFOHEADER size

UINT32

4

Picture format

BITMAPINFOHEADER

Variable

Field 'Picture format' usually contains BITMAPINFOHEADER structure, which is 40 bytes long, but it is not a good idea to rely on this fact, since it may contain something of a larger size.

Type-specific data for audio stream:

Field

Type

Size (bytes)

Sound format

WAVEFORMATEX

14

Sound format extension

-

Variable

Size of sound format extension is equal to cbSize member of WAVEFORMATEX structure.

Stream-specific data for audio stream:

Field

Type

Size (bytes)

H, Total number of audio blocks in each scramble group

UINT8

1

W, Byte size of each scrambling chunk

UINT16

2

Block_align_1, usually = nBlockAlign

UINT16

2

Block_align_2, usually = nBlockAlign

UINT16

2

Unknown

UINT8

1

This data is only present if 'Audio error concealment type' field in the main structure contains corresponding GUID. See section 'Audio error concealment' for details on this field.

All valid ASF files contain one Header Object, as well as one Stream Object per stream.

Data chunk


Data chunk:

Field

Type

Size (bytes)

Chunk type

GUID

16

Chunk length

UINT64

8

Unknown

GUID

16

Number of packets

UINT64

8

Unknown

UINT8

1

Unknown

UINT8

1

Packets

-

variable

As mentioned above, packets have fixed size. It can be found in the corresponding field of Header Object.

Packets


   Compressed video and audio data are usually organized into 'frames' or 'objects' of an arbitrary size. When one needs to transfer such data in packets of a fixed size, there can be three opportunities:
a) Frame size is close to the size of the packet. It would be acceptable to store the frame completely in one packet and pad it to needed size.
b) Frame is larger than the packet. Then it needs to be 'fragmented' into several fragments and sent in different packets.
c) Frame is significantly less than the packet. In this case it would be a good idea to send multiple frames in the same packet. It is called 'grouping'.
<Packet>: <Header> <Segment> [<Segment>] ... <Padding>
There may be several formats of headers, but packets in most movies start with the V82_Header:

Field

Type

Size (bytes)

0x82

UINT8

1

Always 0x0 (?)

UINT16

2

Flags

UINT8

1

Flags are bitwise OR of:
0x40 Explicit packet size specified
0x10 16-bit padding size specified
0x08 8-bit padding size specified
0x01 More than one segment

Segment type ID

UINT8

1

Packet size

UINT16

0 or 2 ( present if bit 0x40 is set in flags )

Padding size

Variable

0, 1 or 2 ( depends on flags )

Send time, milliseconds

UINT32

4

Duration, milliseconds

UINT16

2

Number of segments & segment properties

UINT8

0 or 1 ( depends on flags )


Precise meaning of 'packet size' is not known. It rarely appears in ASF streams, and when it does, it shows complete length of data in this packet ( from the beginning of packet header to the end of the last segment ). Sometimes it's OR'ed with 0x10 or 0x8, but I've never seen packets with specified nonzero padding size and 0x40 set in flags.
Segment:

Field

Type

Size (bytes)

Stream ID

UINT8

1

Sequence number

UINT8

1

Segment-specific fields

-

Variable


Most significant bit ( 0x80 ) is set in the stream ID if the segment contains a keyframe.
Here things become a bit more complicated. Segment-specific fields depend on whether this segment is grouped ( i.e. it contains more than one frame ) or not. This can be deduced from flags value, which is inside segment-specific fields itself!

Segment-specific fields, no grouping:

Field

Type

Size (bytes)

Fragment offset

UINT8, UINT16 or UINT32

Variable

Flags

UINT8

1

Object length

UINT32

4

Object start time, milliseconds

UINT32

4

Data length

UINT8 or UINT16

0, 1 or 2

Data

-

Variable

"Fragment offset" is offset of this fragment in the object ( e.g. video frame ) that contains it. For complete frame in the fragment, fragment offset is 0 and data length is equal to object length.
"Flags" can be either 0x01 or 0x08. 0x01 means "grouping ( multiple objects in segment )", and 0x08 means "no grouping ( single object or fragment )".
"Data length" field is not needed if this segment is the only one in the packet, because in this case data takes all remaining space in the packet ( of course, taking padding into account ). Thus, it's only present when bit 0x01 is set in packet flags.
"Fragment offset" field size is determined by 'Segment Type ID' packet header value. Known possible values for the latter are 0x55, 0x59 and 0x5D, which correspond to 1, 2 and 4 byte sizes.
"Data length" field size is determined by 'Number of segments' packet header value. When 'Number of segments' field is present, its lower bits ( probably 6 of them ) contain number of segments, set bit 0x40 means that 'Data length' segment field is 1-byte wide, and set bit 0x80 means that 'Data length' segment field is 2-byte wide. Otherwise, this field size defaults to 2 bytes.

Segment-specific fields, grouping:

Field

Type

Size (bytes)

Object start time, milliseconds

UINT8, UINT16 or UINT32

Variable

Flags

UINT8

1

Unknown

UINT8

1

Data length

UINT16

0 or 2

Repeat until we run out of data length:

Object length

UINT8

1

Data

-

Variable

...


This structure is similar to the one with 'no grouping', but it does not have 'fragment offset' field, because fragmentation and grouping can not take place simultaneously.
Each segment has a field called 'sequence number'. It can be used to reassemble fragmented objects. Subsequent objects have sequence numbers that differ by 1 ( there will be larger skips in 'sequence number' fields when grouping takes place ). Different fragments of the same object have the same sequence number and the same object start time.
Packets are usually organized in order of increasing timestamps. It is not known if it's always true. Packets may be missing, and this case should be properly handled.

Audio error concealment


   Sometimes compressed audio is stored in stream in a special 'scrambled' manner. It should be descrambled before passing data do audio decompressor. This technique is supposed to increase stream tolerance to errors.
All audio data is separated into 'audio blocks'. Size of an audio block is a multiple of data sample size. The process is defined with two variables: audio block length ( Width ) and number of audio blocks in 'scrambling chunk' ( Height ). This process is most simple to demonstrate with the picture.

Data sent to decoder: [0] [4] [8] [1] [5] [9] [2] [6] [10] [3] [7] [11]

width=4

height=3

 

[0] [1] [2] [3]

[4] [5] [6] [7]

[8] [9] [10][11]

 

Data stored in the stream: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]

 

Here each [x] is data region with size specified in Block_align_1 field of scramble definition structure. Width is first field of that structure, and Height is second field, divided by third.
When total amount of data is not multiple of 'scrambling chunk' size ( in bytes, that's first field times second field ), the remaining part is written as is, without scrambling.
Even when GUID in the stream header indicates that audio is scrambled, there may be no need in it, because very often values of W or H are equal to 1.

Streaming over the Internet


   Media content in ASF format can be streamed over the Internet in several ways. Most popular way is streaming using HTTP protocol. Other protocols, such as UDP, may be supported as well.
URLs for ASF files may lead to 'redirectors'. Redirector is a XML file that describes media that it refers to, includes other URLs and additional data needed for stream playback. Redirector files often have extensions .asx, but it's probably not a requirement. Some details can be found at http://msdn.microsoft.com/peerjournal/wm/g060199a.asp.

Streaming using HTTP protocol


ASF URLs that start with http:// or mms:// refer to streams that are delivered to end-user over protocol that's based on HTTP. They can consist of redirectors, pre-recorded or live ( broadcast ) data. To start transmission, client program connects to server using TCP ( often on port 80 ), sends a HTTP request and listens for data.
Here are descriptions of HTTP requests, in sprintf()-compatible form.

The initial HTTP request of media player. It is used to query for the media type header of the stream (needed for checking if the codecs are installed at the client and for obtaining the type of stream (live stream, pre-recorded content etc..) . Note that the request-context changes with every new HTTP request:

"GET %s HTTP/1.0/r/n", filename
"Accept: */*/r/n"
"User-Agent: NSPlayer/ 4.1.0 .3856/r/n"
"Host: %s/r/n", server_name
"Pragma: no-cache,rate=1.000000,stream-time=0,stream-offset=0:0,request-context=1,max-duration=0/r/n"
"Pragma: xClientGUID={c77e7400 -738a -11d2-9add-0020af 0a 3278}/r/n"
"Connection: Close/r/n/r/n"

The HTTP request that starts downloading prerecorded (=seekable) content. The stream-offset parameter defines the start offset in the ASF file on the server. The stream-time is the timecode (milliseconds) for seeking within the stream:

"GET %s HTTP/1.0/r/n", file
"Accept: */*/r/n"
"User-Agent: NSPlayer/4.1.0.3856/r/n"
"Host: %s/r/n", server_name
"Pragma: no-cache,rate=1.000000,stream-time=0,stream-offset=%u:%u,request-context=2,max-duration=%u/r/n", offset_hi, offset_lo, length
"Pragma: xPlayStrm=1/r/n"
"Pragma: xClientGUID={c77e7400 -738a -11d2-9add-0020af 0a 3278}/r/n"
"Pragma: stream-switch-count=%d/r/n", num_streams
"Pragma: stream-switch-entry=%s/r/n", stream_selection
"Connection: Close/r/n/r/n"

Pay some attention to lines with 'stream-switch-count' and 'stream-switch-entry'. First line includes a number of streams which you want to receive. Second line includes a string in the following form:
ffff:1:0 ffff:2:2 ffff:4:2 ( etc. )
where each entry corresponds to one stream, first value is always 'ffff', second value is the stream ID from ASF header and third value is unknown.
Even if you request for only selected streams, server may send you all of them. So, request with num_streams=1 and stream_selection="ffff:1:0" will sometimes give you all streams ( instead of one ). Same rules apply to broadcast request, described further.
This is the HTTP request that starts downloading live (=broadcast) content.

"GET %s HTTP/1.0/r/n", file
"Accept: */*/r/n"
"User-Agent: NSPlayer/4.1.0.3856/r/n"
"Host: %s/r/n", server_name
"Pragma: no-cache,rate=1.000000,request-context=2/r/n"
"Pragma: xPlayStrm=1/r/n"
"Pragma: xClientGUID={c77e7400 -738a -11d2-9add-0020af 0a 3278}/r/n"
"Pragma: stream-switch-count=1/r/n"
"Pragma: stream-switch-entry=ffff:1:0/r/n"
"Connection: Close/r/n/r/n"

Server reply on these requests consists of an arbitrary number of lines which are terminated by /n ( 0x 0A ) or /r/n ( 0x0D 0x 0A ) ( HTTP header ), an empty line and actual content.
First line of HTTP header has form:
"HTTP/1.%d %d %s", version, errorcode, string
where version is 0 or 1, errorcode is 3-digit HTTP error code and string is an optional server message. Possible error codes include 200 - no error, 404 - file not found, and others.
Other important HTTP header lines:
"Content-Type: %s", content_type
  Content type of data. Possible values:
  application/octet-stream - 'real' binary ASF stream.
  audio/x-ms-wax, audio/x-ms-wma, video/x-ms-asf, video/x-ms-afs, video/x-ms-wvx, video/x-ms-wmv, video/x-ms-wma - ASX redirectors.
"Pragma: features=%s",features
  If "features" has substring "broadcast", the stream is live ( not prerecorded ).
Headers are followed by actual content, separated into chunks. However, these chunks are different from the ones described in previous sections.

Field

Type

Size (bytes)

Basic chunk type

UINT16

2

Chunk length

UINT16

2

Sequence number

UINT32

4

Unknown

-

2

Chunk length confirmation

UINT16

2

Body data

-

Variable

Chunk length corresponds to data that starts from sequence number field.
Basic chunk type can be 0x4424 ( Data follows ), 0x4524 ( Transfer complete ) and 0x4824 ( ASF header chunk follows ).
For type 0x4824 'body data' should be parsed according to the same rules as a local ASF file. It is arranged so that ASF recorder program would not need to leave any 'holes' in file while recording - this chunk includes all ASF content up to the beginning of first packet with compressed media.
For type 0x4424 'body data' contains a complete packet ( for example, first byte of this data is usually 0x82 ). Network transmission may send chunks that are shorter than pktsize from ASF file header, by chopping off padding section.
Some fields in ASF file header may be empty, especially for the live stream.

Known GUIDs

 

struct GUID

{

    long v1;

    short v2;

    short v3;

    unsigned char v4[8];

    int operator==(const GUID& guid) const{return !memcmp(this, &guid, sizeof(GUID));}

};

 

/* GUID indicating audio stream header */

const GUID guid_audio_stream=

        { 0xF8699E40, 0x5B4D, 0x11CF, 0xA8, 0xFD, 0x00, 0x80, 0x 5F , 0x 5C , 0x44, 0x2B };

 

/* GUID indicating video stream header */

const GUID guid_video_stream=

        { 0xBC19EFC0, 0x5B4D, 0x11CF, 0xA8, 0xFD, 0x00, 0x80, 0x 5F , 0x 5C , 0x44, 0x2B };

 

/* GUID indicating that audio error concealment is absent */

const GUID guid_audio_conceal_none=

        { 0x 49f 1a 440, 0x4ece, 0x11d0, 0xa3, 0xac, 0x00, 0xa0, 0xc9, 0x03, 0x48, 0xf6 };

 

/* GUID indicating that interleaved audio error concealment is present */

const GUID guid_audio_conceal_interleave=

        { 0xbfc3cd50, 0x 618f , 0x11cf, 0x8b, 0xb2, 0x00, 0xaa, 0x00, 0xb4, 0xe2, 0x20 };

 

/* GUID for header chunk */

const GUID guid_header=

        {0x75B22630, 0x668E, 0x11CF, 0xA6, 0xD9, 0x00, 0xAA, 0x00, 0x62, 0xCE, 0x 6C };

 

/* GUID for data chunk */

const GUID guid_data_chunk=

        {0x75b22636, 0x668e, 0x11cf, 0xa6, 0xd9, 0x00, 0xaa, 0x00, 0x62, 0xce, 0x 6c };

 

/* GUID for index chunk */

const GUID guid_index_chunk=

        {0x33000890, 0xe5b1, 0x11cf, 0x89, 0xf4, 0x00, 0xa0, 0xc9, 0x03, 0x49, 0xcb};

 

/* GUID for stream header chunk */

const GUID guid_stream_header=

        {0xB7DC0791, 0xA9B7, 0x11CF, 0x8E, 0xE6, 0x00, 0xC0, 0x 0C , 0x20, 0x53, 0x65};

 

/* ASF 2.0 header */

const GUID guid_header_2_0=

        {0xD6E229D1, 0x35da, 0x11d1, 0x90, 0x34, 0x00, 0xa0, 0xc9, 0x03, 0x49, 0xbe};

 

/* File header object */

const GUID guid_file_header=

        {0x8CABDCA1, 0xA947, 0x11CF, 0x8E, 0xE4, 0x00, 0xC0, 0x 0C , 0x20, 0x53, 0x65};

Credits

 

Most of the information contained in this document was collected by Avery Lee <uleea05 at umail.ucsb.edu> and by unknown author of ASFRecorder program. Translated from C/C++ into readable English by yours, truly <divx at euro.ru>. Comments and improvements are welcome.

Last modified on April 5, 2001

 


  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值