wave file format

最新推荐文章于 2020-06-29 17:59:36 发布

littlemumu

最新推荐文章于 2020-06-29 17:59:36 发布

阅读量774

点赞数

分类专栏： audio基础

audio基础专栏收录该内容

4 篇文章 0 订阅

订阅专栏

wave file format is a file format for storing digital audio (waveform) data. itsupports a variety of bit resolutions, sample rates, and channels of audio. this format isvery popular upon ibm pc (clone) platforms, and is widely used in professional programsthat process digital audio waveforms. it takes into account some pecularities of the intelcpu such as little endian byte order.

this format uses microsoft's version of the electronic arts interchange file formatmethod for storing data in "chunks".

data types

a c-like language will be used to describe the data structures in the file. a few extradata types that are not part of standard c, but which will be used in this document, are:

pstring	pascal-style string, a one-byte count followed by that many text bytes. the total number of bytes in this data type should be even. a pad byte can be added to the end of the text to accomplish this. this pad byte is not reflected in the count.
id	a chunk id (ie, 4 ascii bytes).

also note that when you see an array with no size specification (e.g., char ckdata[];),this indicates a variable-sized array in our c-like language. this differs from standard carrays.

constants

decimal values are referred to as a string of digits, for example 123, 0, 100 are alldecimal numbers. hexadecimal values are preceded by a 0x - e.g., 0x0a, 0x1, 0x64.

data organization

all data is stored in 8-bit bytes, arranged in intel 80x86 (ie, little endian) format.the bytes of multiple-byte values are stored with the low-order (ie, least significant)bytes first. data bits are as follows (ie, shown with bit numbers on top):

         7  6  5  4  3  2  1  0
       +-----------------------+
 char: | lsb               msb |
       +-----------------------+

         7  6  5  4  3  2  1  0 15 14 13 12 11 10  9  8
       +-----------------------+-----------------------+
short: | lsb     byte 0        |       byte 1      msb |
       +-----------------------+-----------------------+

         7  6  5  4  3  2  1  0 15 14 13 12 11 10  9  8 23 22 21 20 19 18 17 16 31 30 29 28 27 26 25 24
       +-----------------------+-----------------------+-----------------------+-----------------------+
 long: | lsb     byte 0        |       byte 1          |         byte 2	       |       byte 3      msb |
       +-----------------------+-----------------------+-----------------------+-----------------------+

file structure

a wave file is a collection of a number of different types of chunks. there is arequired format ("fmt ") chunk which contains important parameters describingthe waveform, such as its sample rate. the data chunk, which contains the actual waveformdata, is also required. all other chunks are optional. among the other optional chunks areones which define cue points, list instrument parameters, store application-specificinformation, etc. all of these chunks are described in detail in the following sections ofthis document.

all applications that use wave must be able to read the 2 required chunks and canchoose to selectively ignore the optional chunks. a program that copies a wave should copyall of the chunks in the wave, even those it chooses not to interpret.

there are no restrictions upon the order of the chunks within a wave file, with theexception that the format chunk must precede the data chunk. some inflexibly writtenprograms expect the format chunk as the first chunk (after the riff header) although theyshouldn't because the specification doesn't require this.

here is a graphical overview of an example, minimal wave file. it consists of a singlewave containing the 2 required chunks, a format and a data chunk.

 __________________________
| riff wave chunk	   |
|   groupid  = 'riff'      |
|   rifftype = 'wave'      |
|    __________________    |
|   | format chunk     |   |
|   |	ckid = 'fmt '  |   |
|   |__________________|   |
|    __________________    |
|   | sound data chunk |   |
|   |	ckid = 'data'  |   |
|   |__________________|   |
|__________________________|

a bastardized standard

the wave format is sort of a bastardized standard that was concocted by too many"cooks" who didn't properly coordinate the addition of "ingredients"to the "soup". unlike with the aiff standard which was mostly designed by asmall, coordinated group, the wave format has had all manner of much-too-independent,uncoordinated aberrations inflicted upon it. the net result is that there are far too manychunks that may be found in a wave file -- many of them duplicating the same informationfound in other chunks (but in an unnecessarily different way) simply because there havebeen too many programmers who took too many liberties with unilaterally adding their ownadditions to the wave format without properly coming to a concensus of what everyone elseneeded (and therefore it encouraged an "every man for himself" attitude towardadding things to this "standard"). one example is the instrument chunk versusthe sampler chunk. another example is the note versus label chunks in an associated datalist. i don't even want to get into the totally irresponsible proliferation of compressedformats. (ie, it seems like everyone and his pet dachshound has come up with somecompressed version of storing wave data -- like we need 100 different ways to do that).furthermore, there are lots of inconsistencies, for example how 8-bit data is unsigned,but 16-bit data is signed.

i've attempted to document only those aspects that you're very likely to encounter in awave file. i suggest that you concentrate upon these and refuse to support the work ofprogrammers who feel the need to deviate from a standard with inconsistent, proprietary,self-serving, unnecessary extensions. please do your part to rein in half-ass programming.

sample points and sample frames

a large part of interpreting wave files revolves around the two concepts of samplepoints and sample frames.

a sample point is a value representing a sample of a sound at a given moment in time.for waveforms with greater than 8-bit resolution, each sample point is stored as a linear,2's-complement value which may be from 9 to 32 bits wide (as determined by thewbitspersample field in the format chunk, assuming pcm format -- an uncompressed format).for example, each sample point of a 16-bit waveform would be a 16-bit word (ie, two 8-bitbytes) where 32767 (0x7fff) is the highest value and -32768 (0x8000) is the lowest value.for 8-bit (or less) waveforms, each sample point is a linear, unsigned byte where 255 isthe highest value and 0 is the lowest value. obviously, this signed/unsigned sample pointdiscrepancy between 8-bit and larger resolution waveforms was one of those"oops" scenarios where some microsoft employee decided to change the signsometime after 8-bit wave files were common but 16-bit wave files hadn't yet appeared.

because most cpu's read and write operations deal with 8-bit bytes, it was decided thata sample point should be rounded up to a size which is a multiple of 8 when stored in awave. this makes the wave easier to read into memory. if your adc produces a sample pointfrom 1 to 8 bits wide, a sample point should be stored in a wave as an 8-bit byte (ie,unsigned char). if your adc produces a sample point from 9 to 16 bits wide, a sample pointshould be stored in a wave as a 16-bit word (ie, signed short). if your adc produces asample point from 17 to 24 bits wide, a sample point should be stored in a wave as threebytes. if your adc produces a sample point from 25 to 32 bits wide, a sample point shouldbe stored in a wave as a 32-bit doubleword (ie, signed long). etc.

furthermore, the data bits should be left-justified, with any remaining (ie, pad) bitszeroed. for example, consider the case of a 12-bit sample point. it has 12 bits, so thesample point must be saved as a 16-bit word. those 12 bits should be left-justified sothat they become bits 4 to 15 inclusive, and bits 0 to 3 should be set to zero. shownbelow is how a 12-bit sample point with a value of binary 101000010111 is formattedleft-justified as a 16-bit word.

 ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___
|   |	  |   |   |   |	|   |   |   |   |   |   |   |   |   |   |
| 1   0   1   0   0   0   0   1   0   1   1   1   0   0   0   0 |
|___|___|___|___|___|___|___|___|___|___|___|___|___|___|___|___|
 <---------------------------------------------> <------------->
    12 bit sample point is left justified          rightmost
                                                  4 bits are
                                                  zero padded

but note that, because the wave format uses intel little endian byte order, the lsb isstored first in the wave file as so:

 ___ ___ ___ ___ ___ ___ ___ ___    ___ ___ ___ ___ ___ ___ ___ ___
|   |	  |   |   |   |	|   |   |  |   |   |   |   |   |   |   |   |
| 0   1   1   1   0   0   0   0 |  | 1   0   1   0   0   0   0   1 |
|___|___|___|___|___|___|___|___|  |___|___|___|___|___|___|___|___|
 <-------------> <------------->    <----------------------------->
   bits 0 to 3     4 pad bits                 bits 4 to 11

for multichannel sounds (for example, a stereo waveform), single sample points fromeach channel are interleaved. for example, assume a stereo (ie, 2 channel) waveform.instead of storing all of the sample points for the left channel first, and then storingall of the sample points for the right channel next, you "mix" the two channels'sample points together. you would store the first sample point of the left channel. next,you would store the first sample point of the right channel. next, you would store thesecond sample point of the left channel. next, you would store the second sample point ofthe right channel, and so on, alternating between storing the next sample point of eachchannel. this is what is meant by interleaved data; you store the next sample point ofeach of the channels in turn, so that the sample points that are meant to be"played" (ie, sent to a dac) simultaneously are stored contiguously.

the sample points that are meant to be "played" (ie, sent to a dac)simultaneously are collectively called a sample frame. in the example of our stereowaveform, every two sample points makes up another sample frame. this is illustrated belowfor that stereo example.

  sample       sample              sample
  frame 0      frame 1             frame n
 _____ _____ _____ _____         _____ _____
| ch1 | ch2 | ch1 | ch2 | . . . | ch1 | ch2 |
|_____|_____|_____|_____|       |_____|_____|
 _____
|     | = one sample point
|_____|

for a monophonic waveform, a sample frame is merely a single sample point (ie, there'snothing to interleave). for multichannel waveforms, you should follow the conventionsshown below for which order to store channels within the sample frame. (ie, below, asingle sample frame is displayed for each example of a multichannel waveform).

  channels       1         2
             _________ _________ 
            | left    | right   |
  stereo    |         |         |
            |_________|_________|


                 1         2         3
             _________ _________ _________ 
            | left    | right   | center  |
  3 channel |         |         |         |
            |_________|_________|_________|

                 1         2         3         4
             _________ _________ _________ _________ 
            | front   | front   | rear    | rear    |
  quad      | left    | right   | left    | right   |
            |_________|_________|_________|_________|

                 1         2         3         4
             _________ _________ _________ _________ 
            | left    | center  | right   | surround|
  4 channel |         |         |         |         |
            |_________|_________|_________|_________|

                 1         2         3         4         5         6
             _________ _________ _________ _________ _________ _________
            | left    | left    | center  | right   | right   |surround |
  6 channel | center  |         |         | center  |         |         |
            |_________|_________|_________|_________|_________|_________|

the sample points within a sample frame are packed together; there are no unused bytesbetween them. likewise, the sample frames are packed together with no pad bytes.

note that the above discussion outlines the format of data within an uncompressed datachunk. there are some techniques of storing compressed data in a data chunk. obviously,that data would need to be uncompressed, and then it will adhere to the above layout.

the format chunk

the format (fmt) chunk describes fundamental parameters of the waveform data such assample rate, bit resolution, and how many channels of digital audio are stored in thewave.

#define formatid 'fmt '   /* chunkid for format chunk. note: there is a space at the end of this id. */

typedef struct {
  id             chunkid;
  long           chunksize;

  short          wformattag;
  unsigned short wchannels;
  unsigned long  dwsamplespersec;
  unsigned long  dwavgbytespersec;
  unsigned short wblockalign;
  unsigned short wbitspersample;

/* note: there may be additional fields here, depending upon wformattag. */

} formatchunk;

the id is always "fmt ". the chunksize field is the number of bytes inthe chunk. this does not include the 8 bytes used by id and size fields. for the formatchunk, chunksize may vary according to what "format" of wave file is specified(ie, depends upon the value of wformattag).

wave data may be stored without compression, in which case the sample points are storedas described in sample points and sample frames. alternately, different forms ofcompression may be used when storing the sound data in the data chunk. with compression,each sample point may take a differing number of bytes to store. the wformattag indicateswhether compression is used when storing the data.

if compression is used (ie, wformattag is some value other than 1), then there will beadditional fields appended to the format chunk which give needed information for a programwishing to retrieve and decompress that stored data. the first such additional field willbe an unsigned short that indicates how many more bytes have been appended (after thisunsigned short). furthermore, compressed formats must have a fact chunk which contains anunsigned long indicating the size (in sample points) of the waveform after it has beendecompressed. there are (too) many compressed formats. details about them can be gottenfrom microsoft's web site.

if no compression is used (ie, wformattag = 1), then there are no further fields.

the wchannels field contains the number of audio channels for the sound. a value of 1means monophonic sound, 2 means stereo, 4 means four channel sound, etc. any number ofaudio channels may be represented. for multichannel sounds, single sample points from eachchannel are interleaved. a set of interleaved sample points is called a sample frame.

the actual waveform data is stored in another chunk, the data chunk, which will bedescribed later.

the dwsamplespersec field is the sample rate at which the sound is to be played back insample frames per second (ie, hertz). the 3 standard mpc rates are 11025, 22050, and 44100khz, although other rates may be used.

the dwavgbytespersec field indicates how many bytes play every second. dwavgbytespersecmay be used by an application to estimate what size ram buffer is needed to properlyplayback the wave without latency problems. its value should be equal to the followingformula rounded up to the next whole number:

dwsamplespersec * wblockalign

the wblockalign field should be equal to the following formula, rounded to the nextwhole number:

wchannels * (wbitspersample % 8)

essentially, wblockalign is the size of a sample frame, in terms of bytes. (eg, asample frame for a 16-bit mono wave is 2 bytes. a sample frame for a 16-bit stereo wave is4 bytes. etc).

the wbitspersample field indicates the bit resolution of a sample point (ie, a 16-bitwaveform would have wbitspersample = 16).

one, and only one, format chunk is required in every wave.

data chunk

the data (data) chunk contains the actual sample frames (ie, all channels of waveformdata).

#define dataid 'data'  /* chunk id for data chunk */

typedef struct {
  id             chunkid;
  long           chunksize;

  unsigned char  waveformdata[];
} datachunk;

the id is always data. chunksize is the number of bytes in the chunk, notcounting the 8 bytes used by id and size fields nor any possible pad byte needed to makethe chunk an even size (ie, chunksize is the number of remaining bytes in the chunk afterthe chunksize field, not counting any trailing pad byte).

remember that the bit resolution, and other information is gotten from the formatchunk.

the following discussion assumes uncompressed data.

the waveformdata array contains the actual waveform data. the data is arranged intowhat are called sample frames. for more information on the arrangment of data,see "sample points and sample frames".

you can determine how many bytes of actual waveform data there is from the data chunk'schunksize field. the number of sample frames in waveformdata is determined by dividingthis chunksize by the format chunk's wblockalign.

the data chunk is required. one, and only one, data chunk may appear in a wave.

another way of storing waveform data

so, you're thinking "this wave format isn't that bad. it seems to make sense andthere aren't all that many inconsistencies, duplications, and inefficiencies". youfool! we're just getting started with our first excursion into unnecessaryinconsistencies, duplications, and inefficiency.

sure, countless brain-damaged programmers have inflicted literally dozens of compresseddata formats upon the data chunk, but apparently someone felt that even this wasn't enoughto make your life difficult in trying to support wave files. no, some half-wit decidedthat it would be a good idea to screw around with storing waveform data in something otherthan one data chunk. noooooooooooooo!!!!!!

for some god-forsaken reason, someone came up with the idea of using an imbedded ifflist inside of the wave file. nooooooooooooooooo!!!!!!!! and this "wave list"would contain multiple 'data' and 'slnt' chunks. noooooooooooooooo!!!! the type id forthis list is 'wavl'.

i strongly suggest that you refuse to support any wave file that exhibits this wavelist nonsense. there's no need for it, and hopefully, the misguided programmer whoconjured it up will be embarrassed into hanging his head in shame when nobody agrees tosupport his foolishness. just say "noooooooooooooo!!!!"

cue chunk

the cue chunk contains one or more "cue points" or "markers". eachcue point references a specific offset within the waveformdata array, and has its owncuepoint structure within this chunk.

in conjunction with the playlist chunk, the cue chunk can be used to store loopinginformation.

cuepoint structure

typedef struct {
  long    dwidentifier;
  long    dwposition;
  id      fccchunk;
  long    dwchunkstart;
  long    dwblockstart;
  long    dwsampleoffset;
} cuepoint;

the dwidentifier field contains a unique number (ie, different than the id number ofany other cuepoint structure). this is used to associate a cuepoint structure with otherstructures used in other chunks which will be described later.

the dwposition field specifies the position of the cue point within the "playorder" (as determined by the playlist chunk. see that chunk for a discussion of theplay order).

the fccchunk field specifies the chunk id of the data or wave list chunk which actuallycontains the waveform data to which this cuepoint refers. if there is only one data chunkin the file, then this field is set to the id 'data'. on the other hand, if the filecontains a wave list (which can contain both 'data' and 'slnt' chunks), then fccchunk willspecify 'data' or 'slnt' depending upon in which type of chunk the referenced waveformdata is found.

the dwchunkstart and dwblockstart fields are set to 0 for an uncompressed wave filethat contains one 'data' chunk. these fields are used only for wave files that contain awave list (with multiple 'data' and 'slnt' chunks), or for a compressed file containing a'data' chunk. (actually, in the latter case, dwchunkstart is also set to 0, and onlydwblockstart is used). again, i want to emphasize that you can avoid all of thisunnecessary crap if you avoid hassling with compressed files, or wave lists, and insteadstick to the sensible basics.

the dwchunkstart field specifies the byte offset of the start of the 'data' or 'slnt'chunk which actually contains the waveform data to which this cuepoint refers. this offsetis relative to the start of the first chunk within the wave list. (ie, it's the byteoffset, within the wave list, of where the 'data' or 'slnt' chunk of interest appears. thefirst chunk within the list would be at an offset of 0).

the dwblockstart field specifies the byte offset of the start of the block containingthe position. this offset is relative to the start of the waveform data within the 'data'or 'slnt' chunk.

the dwsampleoffset field specifies the sample offset of the cue point relative to thestart of the block. in an uncompressed file, this equates to simply being the offsetwithin the waveformdata array. unfortunately, the wave documentation is much tooambiguous, and doesn't define what it means by the term "sample offset". thiscould mean a byte offset, or it could mean counting the sample points (for example, in a16-bit wave, every 2 bytes would be 1 sample point), or it could even mean sample frames(as the loop offsets in aiff are specified). who knows? the guy who conjured up the cuechunk certainly isn't saying. i'm assuming that it's a byte offset, like the above 2fields.

cue chunk

#define cueid 'cue '  /* chunk id for cue chunk */

typedef struct {
  id        chunkid;
  long      chunksize;

  long      dwcuepoints;
  cuepoint  points[];
} cuechunk;

the id is always cue . chunksize is the number of bytes in the chunk, notcounting the 8 bytes used by id and size fields.

the dwcuepoints field is the number of cuepoint structures in the cue chunk. ifdwcuepoints is not 0, it is followed by that many cuepoint structures, one after theother. because all fields in a cuepoint structure are an even number of bytes, the lengthof any cuepoint will always be even. thus, cuepoints are packed together with no unusedbytes between them. the cuepoints need not be placed in any particular order.

the cue chunk is optional. no more than one cue chunk can appear in a wave.

playlist chunk

the playlist (plst) chunk specifies a play order for a series of cue points. the cuechunk contains all of the cue points, but the playlist chunk determines how those cuepoints are used when playing back the waveform (ie, which cue points represent loopedsections, and in what order those loops are "played"). the playlist chunkcontains one or more segment structures, each of which identifies a looped section of thewaveform (in conjunction with the cuepoint structure with which it is associated).

segment structure

typedef struct {
  long    dwidentifier;
  long    dwlength;
  long    dwrepeats;
} segment;

the dwidentifier field contains a unique number (ie, different than the id number ofany other segment structure). this field should correspond with the dwindentifier field ofsome cuepoint stored in the cue chunk. in other words, this segment structure contains thelooping information associated with that cuepoint structure with the same id number.

the dwlength field specifies the length of the section in samples (ie, the length ofthe looped section). note that the start position of the loop would be the dwsampleoffsetof the referenced cuepoint structure in the cue chunk. (or, you may need to hassle withthe dwchunkstart and dwblockstart fields as well if dealing with a wave list or compresseddata).

the dwrepeats field specifies the number of times to play the loop. i assume that avalue of 1 means to repeat this loop once only, but the wave documentation is veryincomplete and omits this important information. i have no idea how you would specify aninfinitely repeating loop. certainly, the person who conjured up the playlist chunkappears to have no idea whatsoever. due to the ambiguities, inconsistencies,inefficiencies, and omissions of the cue and playlist chunks, i very much recommend thatyou use the sampler chunk (described later) to replace them.

playlist chunk

#define playlistid 'plst'  /* chunk id for playlist chunk */

typedef struct {
  id        chunkid;
  long      chunksize;

  long      dwsegments;
  segment   segments[];
} playlistchunk;

the id is always plst. chunksize is the number of bytes in the chunk, notcounting the 8 bytes used by id and size fields.

the dwsegments field is the number of segment structures in the playlist chunk. ifdwsegments is not 0, it is followed by that many segment structures, one after the other.because all fields in a segment structure are an even number of bytes, the length of anysegment will always be even. thus, segments are packed together with no unused bytesbetween them. the segments need not be placed in any particular order.

associated data list

the associated data list contains text "labels" or "names" that areassociated with the cuepoint structures in the cue chunk. in other words, this listcontains the text labels for those cuepoints.

again, we're talking about another imbedded iff list within the wave file.noooooooooooooo!!!! what's a list? a list is simply a "master chunk" thatcontains several "sub-chunks". just like with any other chunk, the "masterchunk" has an id and chunksize, but inside of this chunk are sub-chunks, each withits own id and chunksize. of course, the chunksize for the master chunk (ie, list)includes the size of all of these sub-chunks (including their id and chunksize fields).

the "type id" for the associated data list is "adtl". remember thatan iff list header has 3 fields:

typedef struct {
  id      listid;      /* 'list' */
  long    chunksize;   /* includes the type id below */
  id      typeid;     /* 'adtl' */
} listheader;

there are several sub-chunks that may be found inside of the associated data list. theones that are important to wave format have ids of "labl", "note", or"ltxt". ignore the rest. here are those 3 sub-chunks and their fields:

the associated data list is optional. the wave documentation doesn't specify if morethan one can be contained in a wave file.

label chunk

#define labelid 'labl'  /* chunk id for label chunk */

typedef struct {
  id      chunkid;
  long    chunksize;

  long    dwidentifier;
  char    dwtext[];
} labelchunk;

the id is always labl. chunksize is the number of bytes in the chunk, notcounting the 8 bytes used by id and size fields nor any possible pad byte needed to makethe chunk an even size (ie, chunksize is the number of remaining bytes in the chunk afterthe chunksize field, not counting any trailing pad byte).

the dwidentifier field contains a unique number (ie, different than the id number ofany other label chunk). this field should correspond with the dwindentifier field of somecuepoint stored in the cue chunk. in other words, this label chunk contains the text labelassociated with that cuepoint structure with the same id number.

the dwtext array contains the text label. it should be a null-terminated string. (thenull byte is included in the chunksize, therefore the length of the string, including thenull byte, is chunksize - 4).

note chunk

#define noteid 'note'  /* chunk id for note chunk */

typedef struct {
  id      chunkid;
  long    chunksize;

  long    dwidentifier;
  char    dwtext[];
} notechunk;

the note chunk, whose id is note, is otherwise exactly the same as the labelchunk (ie, same fields). see what i mean about pointless duplication? but, in theory, anote chunk contains a "comment" about a cuepoint, whereas the label chunk issupposed to contain the actual cuepoint label. so, it's possible that you'll find both anote and label for a specific cuepoint, each containing different text.

labeled text chunk

#define labeltextid 'ltxt'  /* chunk id for labeled text chunk */

typedef struct {
  id      chunkid;
  long    chunksize;

  long    dwidentifier;
  long    dwsamplelength;
  long    dwpurpose;
  short   wcountry;
  short   wlanguage;
  short   wdialect;
  short   wcodepage;
  char    dwtext[];
} labeltextchunk;

the id is always ltxt. chunksize is the number of bytes in the chunk, notcounting the 8 bytes used by id and size fields nor any possible pad byte needed to makethe chunk an even size (ie, chunksize is the number of remaining bytes in the chunk afterthe chunksize field, not counting any trailing pad byte).

the dwidentifier field is the same as the label chunk.

the dwsamplelength field specifies the number of sample points in the segment ofwaveform data. in other words, a labeled text chunk contains a label for a sectionof the waveform data, not just a specific point, for example the looped section of awaveform.

the dwpurpose field specifies the type or purpose of the text. for example, dwpurposecan contain an id like "scrp" for script text or "capt" forclose-caption text. how is this related to waveform data? well, it isn't really. it's justthat associated data lists are used in other file formats, so they contain generic fieldsthat sometimes don't have much relevance to waveform data.

the wcountry, wlanguage, and wcodepage fields specify the country code,language/dialect, and code page for the text. an application typically queries thesevalues from the operating system.

sampler chunk

the sampler (smpl) chunk defines basic parameters that an instrument, such as a midisampler, could use to play the waveform data. most importantly, it includes informationabout looping the waveform (ie, during playback, to "sustain" the waveform). ofcourse, as you've come to expect from the wave file format, it duplicates some of theinformation that can be found in the cue and playlist chunks, but fortunately, in a moresensible, consistent, better-documented way.

#define samplerid 'smpl'  /* chunk id for sampler chunk */

typedef struct {
  id             chunkid;
  long           chunksize;

  long           dwmanufacturer;
  long           dwproduct;
  long           dwsampleperiod;
  long           dwmidiunitynote;
  long           dwmidipitchfraction;
  long           dwsmpteformat;
  long           dwsmpteoffset;
  long           csampleloops;
  long           cbsamplerdata;
  struct sampleloop loops[];
} samplerchunk;

the id is always smpl. chunksize is the number of bytes in the chunk, notcounting the 8 bytes used by id and size fields nor any possible pad byte needed to makethe chunk an even size (ie, chunksize is the number of remaining bytes in the chunk afterthe chunksize field, not counting any trailing pad byte).

the dwmanufacturer field contains the mma manufacturer code for the intended sampler.each manufacturer of midi products has his own id assigned to him by the midimanufacturer's association. the high byte of dwmanufacturer indicates the number of loworder bytes (1 or 3) that are valid for the manufacturer code. for example, this valuewill be 0x01000013 for digidesign (the mma manufacturer code is one byte, 0x13); whereas0x03000041 identifies microsoft (the mma manufacturer code is three bytes, 0x00 0x000x41). if the wave is not intended for a specific manufacturer, then this field should beset to 0.

the dwproduct field contains the product code (ie, model id) of the intended samplerfor the dwmanufacturer. contact the manufacturer of the sampler to ascertain the sampler'smodel id. if the wave is not intended for a specific manufacturer's product, then thisfield should be set to 0.

the dwsampleperiod field specifies the period of one sample in nanoseconds (normally1/nsamplespersec from the format chunk. but note that this field allows finer tuning thannsamplespersec). for example, 44.1 khz would be specified as 22675 (0x00005893).

the dwmidiunitynote field is the midi note number at which the instrument plays backthe waveform data without pitch modification (ie, at the same sample rate that was usedwhen the waveform was created). this value ranges 0 through 127, inclusive. middle c is60.

the dwmidipitchfraction field specifies the fraction of a semitone up from thespecified dwmidiunitynote. a value of 0x80000000 is 1/2 semitone (50 cents); a value of0x00000000 represents no fine tuning between semitones.

the dwsmpteformat field specifies the smpte time format used in the dwsmpteoffsetfield. possible values are:

0  = no smpte offset (dwsmpteoffset should also be 0)
24 = 24 frames per second
25 = 25 frames per second
29 = 30 frames per second with frame dropping ('30 drop')
30 = 30 frames per second

the dwsmpteoffset field specifies a time offset for the sample if it is to besyncronized or calibrated according to a start time other than 0. the format of this valueis 0xhhmmssff. hh is a signed hours value [-23..23]. mm is an unsigned minutes value[0..59]. ss is unsigned seconds value [0..59]. ff is an unsigned value [0..( - 1)].

the csampleloops field is the number (count) of sampleloop structures that are appendedto this chunk. these structures immediately follow the cbsamplerdata field. this fieldwill be 0 if there are no sampleloop structures.

the cbsamplerdata field specifies the size (in bytes) of any optional fields that anapplication wishes to append to this chunk. an application which needed to save additionalinformation (ie, beyond the above fields) may append additional fields to the end of thischunk, after all of the sampleloop structures. these additional fields are also reflectedin the chunksize, and remember that the chunk should be padded out to an even number ofbytes. the cbsamplerdata field will be 0 if no additional information is appended to thechunk.

what follows the above fields are any sampleloop structures. each sampleloop structuredefines one loop (ie, the start and end points of the loop, and how many times it plays).what follows any sampleloop structures are any additional, proprietary sampler informationthat an application chooses to store.

sampleloop structure

typedef struct {
  long  dwidentifier;
  long  dwtype;
  long  dwstart;
  long  dwend;
  long  dwfraction;
  long  dwplaycount;
} sampleloop;

the dwidentifier field contains a unique number (ie, different than the id number ofany other sampleloop structure). this field may correspond with the dwidentifier field ofsome cuepoint stored in the cue chunk. in other words, the cuepoint structure which hasthe same id number would be considered to be describing the same loop as this sampleloopstructure. furthermore, this field corresponds to the dwindentifier field of any labelstored in the associated data list. in other words, the text string (within some chunk inthe associated data list) which has the same id number would be considered to be thisloop's "name" or "label".

the dwtype field is the loop type (ie, how the loop plays back) as so:

0 - loop forward (normal)
1 - alternating loop (forward/backward)
2 - loop backward
3-31 - reserved for future standard types
32-? - sampler specific types (manufacturer defined)

the dwstart field specifies the startpoint of the loop. in other words, it's the byteoffset from the start of waveformdata[], where an offset of 0 would be at the start of thewaveformdata[] array (ie, the loop start is at the very first sample point).

the dwend field specifies the endpoint of the loop (ie, a byte offset).

the dwfraction field allows fine-tuning for loop fractional areas between samples.values range from 0x00000000 to 0xffffffff. a value of 0x80000000 represents 1/2 of asample length.

the dwplaycount field is the number of times to play the loop. a value of 0 specifiesan infinite sustain loop (ie, the wave keeps looping until some external force interruptsplayback, such as the musician releasing the key that triggered that wave's playback).

the sampler chunk is optional. i don't know as if there is any limit of one per wavefile. i don't see why there should be such a limit, since after all, an application mayneed to deal with several midi samplers.

the instrument chunk format

the instrument chunk contains some of the same type of information as the samplerchunk. so what else is new?

#define instrumentid 'inst'  /* chunkid for instruments chunk */

typedef struct {
  id     chunkid;
  long   chunksize;

  unsigned char unshiftednote;
  char          finetune;
  char          gain;
  unsigned char lownote;
  unsigned char highnote;
  unsigned char lowvelocity;
  unsigned char highvelocity;
} instrumentchunk;

the id is always inst. chunksize should always be 7 since there are no fields ofvariable length.

the unshiftednote field is the same as the sampler chunk's dwmidiunitynote field.

the finetune field determines how much the instrument should alter the pitch of thesound when it is played back. units are in cents (1/100 of a semitone) and range from -50to +50. negative numbers mean that the pitch of the sound should be lowered, whilepositive numbers mean that it should be raised. while not the same measurement is used,this field serves the same purpose as the sampler chunk's dwfraction field.

the gain field is the amount by which to change the gain of the sound when it isplayed. units are decibels. for example, 0db means no change, 6db means double the valueof each sample point (ie, every additional 6db doubles the gain), while -6db means halvethe value of each sample point.

the lownote and highnote fields specify the suggested midi note range on a keyboard forplayback of the waveform data. the waveform data should be played if the instrument isrequested to play a note between the low and high note numbers, inclusive. theunshiftednote does not have to be within this range.

the lowvelocity and highvelocity fields specify the suggested range of midi velocitiesfor playback of the waveform data. the waveform data should be played if the note-onvelocity is between low and high velocity, inclusive. the range is 1 (lowest velocity)through 127 (highest velocity), inclusive.

the instrument chunk is optional. no more than 1 instrument chunk can appear in onewave.

audio interchange file format (aiff)

audio interchange file format (or aiff) is a file format for storing digital audio(waveform) data. it supports a variety of bit resolutions, sample rates, and channels ofaudio. this format is very popular upon apple platforms, and is widely used inprofessional programs that process digital audio waveforms.

this format uses the electronic arts interchange file format method for storing data in"chunks". you should read the article about interchange file formatbefore proceeding.

data types

a c-like language will be used to describe the data structures in the file. a few extradata types that are not part of standard c, but which will be used in this document, are:

extended	80 bit ieee standard 754 floating point number (standard apple numeric environment [sane] data type extended). this would be a 10 byte field.
pstring	pascal-style string, a one-byte count followed by that many text bytes. the total number of bytes in this data type should be even. a pad byte can be added to the end of the text to accomplish this. this pad byte is not reflected in the count.
id	a chunk id (ie, 4 ascii bytes) as described in about interchange file format.

also note that when you see an array with no size specification (e.g., char ckdata[];),this indicates a variable-sized array in our c-like language. this differs from standard carrays.

constants

decimal values are referred to as a string of digits, for example 123, 0, 100 are alldecimal numbers. hexadecimal values are preceded by a 0x - e.g., 0x0a, 0x1, 0x64.

data organization

all data is stored in motorola 68000 (ie, big endian) format. the bytes ofmultiple-byte values are stored with the high-order (ie, most significant) bytes first.data bits are as follows (ie, shown with bit numbers on top):

         7  6  5  4  3  2  1  0
       +-----------------------+
 char: | msb               lsb |
       +-----------------------+

        15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
       +-----------------------+-----------------------+
short: | msb     byte 0        |       byte 1      lsb |
       +-----------------------+-----------------------+

        31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
       +-----------------------+-----------------------+-----------------------+-----------------------+
 long: | msb     byte 0        |       byte 1          |         byte 2	       |       byte 3      lsb |
       +-----------------------+-----------------------+-----------------------+-----------------------+

file structure

an audio iff file is a collection of a number of different types of chunks. there is arequired common chunk which contains important parameters describing the waveform, such asits length and sample rate. the sound data chunk, which contains the actual waveform data,is also required if the waveform data has a length greater than 0 (ie, there actually iswaveform data in the form). all other chunks are optional. among the other optional chunksare ones which define markers, list instrument parameters, store application-specificinformation, etc. all of these chunks are described in detail in the following sections ofthis document.

all applications that use form aiff must be able to read the 2 required chunks and canchoose to selectively ignore the optional chunks. a program that copies a form aiff shouldcopy all of the chunks in the form aiff, even those it chooses not to interpret.

there are no restrictions upon the order of the chunks within a form aiff.

here is a graphical overview of an example, minimal aiff file. it consists of a singleform aiff containing the 2 required chunks, a common chunk and a sound data chunk.

 __________________________
| form aiff chunk	   |
|   ckid  = 'form'         |
|   formtype = 'aiff'      |
|    __________________    |
|   | common chunk     |   |
|   |	ckid = 'comm'  |   |
|   |__________________|   |
|    __________________    |
|   | sound data chunk |   |
|   |	ckid = 'ssnd'  |   |
|   |__________________|   |
|__________________________|

sample points and sample frames

a large part of interpreting audio iff files revolves around the two concepts of samplepoints and sample frames.

a sample point is a value representing a sample of a sound at a given moment in time.each sample point is stored as a linear, 2's-complement value which may be from 1 to 32bits wide (as determined by the samplesize field in the common chunk). for example, eachsample point of an 8-bit waveform would be an 8-bit byte (ie, a signed char).

because most cpu's read and write operations deal with 8-bit bytes, it was decided thata sample point should be rounded up to a size which is a multiple of 8 when stored in anaiff. this makes the aiff easier to read into memory. if your adc produces a sample pointfrom 1 to 8 bits wide, a sample point should be stored in an aiff as an 8-bit byte (ie,signed char). if your adc produces a sample point from 9 to 16 bits wide, a sample pointshould be stored in an aiff as a 16-bit word (ie, signed short). if your adc produces asample point from 17 to 24 bits wide, a sample point should be stored in an aiff as threebytes. if your adc produces a sample point from 25 to 32 bits wide, a sample point shouldbe stored in an aiff as a 32-bit doubleword (ie, signed long). etc.

 ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___
|   |	  |   |   |   |	|   |   |   |   |   |   |   |   |   |   |
| 1   0   1   0   0   0   0   1   0   1   1   1   0   0   0   0 |
|___|___|___|___|___|___|___|___|___|___|___|___|___|___|___|___|
 <---------------------------------------------> <------------->
    12 bit sample point is left justified          rightmost
                                                  4 bits are
                                                  zero padded

  sample       sample              sample
  frame 0      frame 1             frame n
 _____ _____ _____ _____         _____ _____
| ch1 | ch2 | ch1 | ch2 | . . . | ch1 | ch2 |
|_____|_____|_____|_____|       |_____|_____|
 _____
|     | = one sample point
|_____|

  channels       1         2
             _________ _________ 
            | left    | right   |
  stereo    |         |         |
            |_________|_________|


                 1         2         3
             _________ _________ _________ 
            | left    | right   | center  |
  3 channel |         |         |         |
            |_________|_________|_________|

                 1         2         3         4
             _________ _________ _________ _________ 
            | front   | front   | rear    | rear    |
  quad      | left    | right   | left    | right   |
            |_________|_________|_________|_________|

                 1         2         3         4
             _________ _________ _________ _________ 
            | left    | center  | right   | surround|
  4 channel |         |         |         |         |
            |_________|_________|_________|_________|

                 1         2         3         4         5         6
             _________ _________ _________ _________ _________ _________
            | left    | left    | center  | right   | right   |surround |
  6 channel | center  |         |         | center  |         |         |
            |_________|_________|_________|_________|_________|_________|

the sample points within a sample frame are packed together; there are no unused bytesbetween them. likewise, the sample frames are packed together with no pad bytes.

the common chunk

the common chunk describes fundamental parameters of the waveform data such as samplerate, bit resolution, and how many channels of digital audio are stored in the form aiff.

#define commonid 'comm'   /* chunkid for common chunk */

typedef struct {
  id             chunkid;
  long           chunksize;

  short          numchannels;
  unsigned long  numsampleframes;
  short          samplesize;
  extended       samplerate;
} commonchunk;

the id is always comm. the chunksize field is the number of bytes in the chunk.this does not include the 8 bytes used by id and size fields. for the common chunk,chunksize should always 18 since there are no fields of variable length (but to maintaincompatibility with possible future extensions, if the chunksize is > 18, you shouldalways treat those extra bytes as pad bytes).

the numchannels field contains the number of audio channels for the sound. a value of 1means monophonic sound, 2 means stereo, 4 means four channel sound, etc. any number ofaudio channels may be represented. for multichannel sounds, single sample points from eachchannel are interleaved. a set of interleaved sample points is called a sample frame.

the actual waveform data is stored in another chunk, the sound data chunk, which willbe described later.

the numsampleframes field contains the number of sample frames. this is not necessarilythe same as the number of bytes nor the number of sample points in the sound data chunk(ie, it won't be unless you're dealing with a mono waveform). the total number of samplepoints in the file is numsampleframes times numchannels.

the samplesize is the number of bits in each sample point. it can be any number from 1to 32.

the samplerate field is the sample rate at which the sound is to be played back insample frames per second.

one, and only one, common chunk is required in every form aiff.

sound data chunk

the sound data chunk contains the actual sample frames (ie, all channels of waveformdata).

#define sounddataid 'ssnd'  /* chunk id for sound data chunk */

typedef struct {
  id             chunkid;
  long           chunksize;

  unsigned long  offset;
  unsigned long  blocksize;
  unsigned char  waveformdata[];
}  sounddatachunk;

the id is always ssnd. chunksize is the number of bytes in the chunk, notcounting the 8 bytes used by id and size fields nor any possible pad byte needed to makethe chunk an even size (ie, chunksize is the number of remaining bytes in the chunk afterthe chunksize field, not counting any trailing pad byte).

you can determine how many bytes of actual waveform data there is by subtracting 8 fromthe chunksize. remember that the number of sample frames, bit resolution, and otherinformation is gotten from the common chunk.

the offset field determines where the first sample frame in the waveformdata starts.the offset is in bytes. most applications won't use offset and should set it to zero. usefor a non-zero offset is explained in "block-aligning waveform data".

the blocksize is used in conjunction with offset for block-aligning waveform data. itcontains the size in bytes of the blocks that waveform data is aligned to. as with offset,most applications won't use blocksize and should set it to zero. more information onblocksize is in "block-aligning waveform data".

the waveformdata array contains the actual waveform data. the data is arranged intowhat are called sample frames the number of sample frames in waveformdata isdetermined by the numsampleframes field in the common chunk. for more information, see"sample points and sample frames".

the sound data chunk is required unless the numsampleframes field in the common chunkis zero. one, and only one, sound data chunk may appear in a form aiff.

block-aligning waveform data

there may be some applications that, to ensure real time recording and playback ofaudio, wish to align waveform data into fixed-size blocks. this alignment can beaccomplished with the offset and blocksize parameters of the sound data chunk, as shownbelow.

  _________________ ___________________________________ _________________
 |\\\\ unused \\\\\|         sample frames             |\\\\ unused \\\\\|
 |_________________|___________________________________|_________________|
 <-- offset><- numsampleframes sample frames>

 |    blocksize    |                 |                 |                 |
 |<- bytes>|                 |                 |                 |
 |_________________|_________________|_________________|_________________|
      block n-1          block n          block n+1         block n+2

above, the first sample frame starts at the beginning of block n. this is accomplishedby skipping the first offset bytes (ie, some stored pad bytes) of the waveformdata. notethat there may also be pad bytes stored at the end of waveformdata to pad it out so thatit ends upon a block boundary.

the blocksize specifies the size in bytes of the block to which you would align thewaveform data. a blocksize of 0 indicates that the waveform data does not need to beblock-aligned. applications that don't care about block alignment should set the blocksizeand offset to 0 when creating aiff files. applications that write block-aligned waveformdata should set blocksize to the appropriate block size. applications that modify anexisting aiff file should try to preserve alignment of the waveform data, although this isnot required. if an application does not preserve alignment, it should set the blocksizeand offset to 0. if an application needs to realign waveform data to a different sizedblock, it should update blocksize and offset accordingly.

the marker chunk

the marker chunk contains markers that point to positions in the waveform data. markerscan be used for whatever purposes an application desires. the instrument chunk, definedlater in this document, uses markers to mark loop beginning and end points.

a marker structure is as follows:

typedef short  markerid;

typedef struct {
  markerid       id;
  unsigned long  position;
  pstring        markername;
} marker;

the id is a number that uniquely identifies that marker within an aiff. the id can beany positive non-zero integer, as long as no other marker within the same form aiff hasthe same id.

the marker's position in the waveformdata is determined by the position field. markersconceptually fall between two sample frames. a marker that falls before the first sampleframe in the waveform data is at position 0, while a marker that falls between the firstand second sample frame in the waveform data is at position 1. therefore, the units forposition are sample frames, not bytes nor sample points.

sample frames
 ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___
|   |   |   |   |   |   |   |   |   |   |   |   |
|___|___|___|___|___|___|___|___|___|___|___|___|
^                   ^                           ^
position 0          position 5                  position 12

the markername field is a pascal-style text string containing the name of the mark.

note: some "ea iff 85" files store strings as c-strings (text bytes followedby a null terminating character) instead of pascal-style strings. audio iff uses pstringsbecause they are more efficiently skipped over when scanning through chunks. usingpstrings, a program can skip over a string by adding the string count to the address ofthe first character. c strings require that each character in the string be examined forthe null terminator.

marker chunk format

the format for the data within a marker chunk is shown below.

#define markerid 'mark'  /* chunkid for marker chunk */

typedef  struct {
  id              chunkid;
  long            chunksize;

  unsigned short  nummarkers;
  marker          markers[];
} markerchunk;

the id is always mark. chunksize is the number of bytes in the chunk, notcounting the 8 bytes used by id and size fields.

the nummarkers field is the number of marker structures in the marker chunk. ifnummarkers is not 0, it is followed by that many marker structures, one after the other.because all fields in a marker structure are an even number of bytes, the length of anymarker will always be even. thus, markers are packed together with no unused bytes betweenthem. the markers need not be placed in any particular order.

the marker chunk is optional. no more than one marker chunk can appear in a form aiff.

the instrument chunk

the instrument chunk defines basic parameters that an instrument, such as a midisampler, could use to play the waveform data.

looping

waveform data can be looped, allowing a portion of the waveform to be repeated in orderto lengthen the sound. the structure below describes a loop.

typedef struct {
  short     playmode;
  markerid  beginloop;
  markerid  endloop;
} loop;

a loop is marked with two points, a begin position and an end position. there are twoways to play a loop, forward looping and forward/backward looping. in the case of forwardlooping, playback begins at the beginning of the waveform, continues past the beginposition and continues to the end position, at which point playback starts again at thebegin position. the segment between the begin and end positions, called the loop segment,is played repeatedly until interrupted by some action, such as a musician releasing a keyon a musical controller.

               ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___
sample frames |   |   |   |<--- loop segment>|   |   |   |
              |___|___|___|___|___|___|___|___|___|___|___|___|
                          ^                       ^
                     begin position	        end position

with forward/backward looping, the loop segment is first played from the begin positionto the end position, and then played backwards from the end position to the beginposition. this flip-flop pattern is repeated over and over again until interrupted.

the playmode specifies which type of looping is to be performed:

#define nolooping              0
#define forwardlooping         1
#define forwardbackwardlooping 2

if nolooping is specified, then the loop points are ignored during playback.

the beginloop is a marker id that marks the begin position of the loop segment.

the endloop marks the end position of a loop. the begin position must be less than theend position. if this is not the case, then the loop segment has 0 or negative length andno looping takes place.

the instrument chunk format

the format of the data within an instrument chunk is described below.

#define instrumentid 'inst'  /*chunkid for instruments chunk */

typedef struct {
  id     chunkid;
  long   chunksize;

  char   basenote;
  char   detune;
  char   lownote;
  char   highnote;
  char   lowvelocity;
  char   highvelocity;
  short  gain;
  loop   sustainloop;
  loop   releaseloop;
} instrumentchunk;

the id is always inst. chunksize should always be 20 since there are no fieldsof variable length.

the basenote is the note number at which the instrument plays back the waveform datawithout pitch modification (ie, at the same sample rate that was used when the waveformwas created). units are midi note numbers, and are in the range 0 through 127. middle c is60.

the detune field determines how much the instrument should alter the pitch of the soundwhen it is played back. units are in cents (1/100 of a semitone) and range from -50 to+50. negative numbers mean that the pitch of the sound should be lowered, while positivenumbers mean that it should be raised.

the lownote and highnote fields specify the suggested note range on a keyboard forplayback of the waveform data. the waveform data should be played if the instrument isrequested to play a note between the low and high note numbers, inclusive. the base notedoes not have to be within this range. units for lownote and highnote are midi notevalues.

the lowvelocity and highvelocity fields specify the suggested range of velocities forplayback of the waveform data. the waveform data should be played if the note-on velocityis between low and high velocity, inclusive. units are midi velocity values, 1 (lowestvelocity) through 127 (highest velocity).

the gain is the amount by which to change the gain of the sound when it is played.units are decibels. for example, 0db means no change, 6db means double the value of eachsample point (ie, every additional 6db doubles the gain), while -6db means halve the valueof each sample point.

the sustainloop field specifies a loop that is to be played when an instrument issustaining a sound.

the releaseloop field specifies a loop that is to be played when an instrument is inthe release phase of playing back a sound. the release phase usually occurs after a key onan instrument is released.

the instrument chunk is optional. no more than 1 instrument chunk can appear in oneform aiff.

the midi data chunk

the midi data chunk can be used to store midi data.

the primary purpose of this chunk is to store midi system exclusive messages, althoughother types of midi data can be stored in the chunk as well. as more instruments come tomarket, they will likely have parameters that have not been included in the aiffspecification. the sys ex messages for these instruments may contain many parameters thatare not included in the instrument chunk. for example, a new midi sampler may support morethan the two loops per waveform. these loops will likely be represented in the sys exmessage for the new sampler. this message can be stored in the midi data chunk (ie, so youhave some place to store these extra loop points that may not be used by otherinstruments).

#define mididataid 'midi' /* chunkid for midi data chunk */

typedef struct {
  id             chunkid;
  long           chunksize;

  unsigned char  mididata[];
} mididatachunk;

the id is always midi. chunksize is the number of bytes in the chunk, notcounting the 8 bytes used by id and size fields nor any possible pad byte needed to makethe chunk an even size.

the mididata field contains a stream of midi data. there should be as many bytes aschunksize specifies, plus perhaps a pad byte if needed.

the midi data chunk is optional. any number of these chunks may exist in one form aiff.if midi system exclusive messages for several instruments are to be stored in a form aiff,it is better to use one midi data chunk per instrument than one big midi data chunk forall of the instruments.

the audio recording chunk

the audio recording chunk contains information pertinent to audio recording devices.

#define audiorecording id 'aesd'   /* chunkid for audio recording chunk. */

typedef struct {
  id             chunkid
  long           chunksize;

  unsigned char  aeschannelstatusdata[24];
} audiorecordingchunk;

the id is always aesd. chunksize should always be 24 since there are no fieldsof variable length.

the 24 bytes of aescchannelstatusdata are specified in the "aes recommendedpractice for digital audio engineering - serial transmission format for linearlyrepresented digital audio data", transmission of digital audio between audio devices.this information is duplicated in the audio recording chunk for convenience. of generalinterest would be bits 2, 3, and 4 of byte 0, which describe recording emphasis.

the audio recording chunk is optional. no more than 1 audio recording chunk may appearin one form aiff.

the application specific chunk

the application specific chunk can be used for any purposes whatsoever by developersand application authors. for example, an application that edits sounds might want to usethis chunk to store editor state parameters such as magnification levels, last cursorposition, etc.

#define applicationspecificid 'appl' /* chunkid for application specific chunk. */

typedef struct {
  id       chunkid;
  long     chunksize;

  char     applicationsignature[4];
  char     data[];
} applicationspecificchunk;

the id is always appl. chunksize is the number of bytes in the chunk, notcounting the 8 bytes used by id and size fields nor any possible pad byte needed to makethe chunk an even size.

the applicationsignature field is used by applications which run on platforms fromapple computer, inc. for the apple ii, this field should be set to 'pdos'. for the mac,this field should be set to the application's four character signature as registered withapple technical support.

the data field is the data specific to the application. the application determines howmany bytes are stored here, and what their purpose are. a trailing pad byte must follow ifthat is needed in order to make the chunk an even size.

the application specific chunk is optional. any number of these chunks may exist in aone form aiff.

the comments chunk

the comments chunk is used to store comments in the form aiff. standard iff has anannotation chunk that can also be used for comments, but this new comments chunk has twofields (per comment) not found in the standard iff chunk. they are a time-stamp for thecomment and a link to a marker.

comment structure

a comment structure consists of a time stamp, marker id, and a text count followed bytext.

typedef struct {
  unsigned long   timestamp;
  markerid        marker;
  unsigned short  count;
  char            text[];
} comment;

the timestamp indicates when the comment was created. on the amiga, units are thenumber of seconds since january 1, 1978. on the mac, units are the number of seconds sincejanuary 1, 1904.

a comment can be linked to a marker. this allows applications to store longdescriptions of markers as a comment. if the comment is referring to a marker, then themarker field is the id of that marker. otherwise, marker is 0, indicating that thiscomment is not linked to any marker.

the count is the length of the text that makes up the comment. this is a 16-bitquantity, allowing much longer comments than would be available with a pstring. this countdoes not include any possible pad byte needed to make the comment an even number of bytesin length.

the text field contains the comment itself, followed by a pad byte if needed to makethe text field an even number of bytes.

comments chunk format

#define commentid 'comt'  /* chunkid for comments chunk  */

typedef struct {
  id              chunkid;
  long            chunksize;

  unsigned short  numcomments;
  char            comments[];
}commentschunk;

the id is always comt. chunksize is the number of bytes in the chunk, notcounting the 8 bytes used by id and size fields.

the numcomments field contains the number of comment structures in the chunk. this isfollowed by the comment structures, one after the other. comment structures are alwayseven numbers of bytes in length, so there is no padding needed between structures.

the comments chunk is optional. no more than 1 comments chunk may appear in one formaiff.

the text chunks, name, author, copyright, annotation

these four optional chunks are included in the definition of every standard iff file.

#define nameid 'name'   /* chunkid for name chunk */
#define nameid 'auth'   /* chunkid for author chunk */
#define nameid '(c) '   /* chunkid for copyright chunk */
#define nameid 'anno'   /* chunkid for annotation chunk */

typedef struct {
  id      chunkid;
  long    chunksize;
  char    text[];
}textchunk;

chunksize is the number of bytes in the chunk, not counting the 8 bytes used by id andsize fields nor any possible pad byte needed to make the chunk an even size.

chunk precedence

several of the local chunks for form aiff may contain duplicate information. forexample, the instrument chunk defines loop points and some midi sys ex data in the mididata chunk may also define loop points. what happens if these loop points are different?how is an application supposed to loop the sound? such conflicts are resolved by defininga precedence for chunks. this precedence is illustrated below.

       common chunk              highest precedence
		|
     sound data chunk
            |
       marker chunk
            |
      instrument chunk
            |
       comment chunk
            |
        name chunk
            |
       author chunk
            |
     copyright chunk
            |
     annotation chunk
            |
  audio recording chunk
            |
     midi data chunk
            |
application specific chunk      lowest precedence

the common chunk has the highest precedence, while the application specific chunk hasthe lowest. information in the common chunk always takes precedence over conflictinginformation in any other chunk. the application specific chunk always loses in conflictswith other chunks. by looking at the chunk hierarchy, for example, one sees that the looppoints in the instrument chunk take precedence over conflicting loop points found in themidi data chunk.

it is the responsibility of applications that write data into the lower precedencechunks to make sure that the higher precedence chunks are updated accordingly (ie, so thatconflicts tend not to exist).

errata

the apple iigs sampled instrument format also defines a chunk with id of"inst," which is not the same as the aiff instrument chunk. a good way to tellthe two chunks apart in generic iff-style readers is by the chunksize fields. the aiffinstrument chunk's chunksize field is always 20, whereas the apple iigs sampled instrumentformat instrument chunk's chunksize field, for structural reasons, can never be 20.

storage of aiff on apple and other platforms

on a macintosh, the form aiff, is stored in the data fork of an audio iff file. themacintosh file type of an audio iff file is 'aiff'. this is the same as the formtype ofthe form aiff. macintosh applications should not store any information in audio iff file'sresource fork, as this information may not be preserved by all applications. applicationscan use the application specific chunk, defined later in this document, to store extrainformation specific to their application.

audio iff files may be identified in other apple file systems as well. on a macintoshunder mfs or hfs, the form aiff is stored in the data fork of a file with file type"aiff." this is the same as the formtype of the form aiff.

on an operating system such as ms-dos or unix, where it is customary to use a file nameextension, it is recommended that audio iff file names use ".aif" for theextension.

referring to audio iff

the official name is "audio interchange file format". if an application needsto present the name of this format to a user, such as in a "save as..." dialogbox, the name can be abbreviated to audio iff. referring to audio iff files by afour-letter abbreviation (ie, "aiff") at the user-level should be avoided.

converting extended data to a unsigned long

the sample rate field in a common chunk is expressed as an 80 bit ieee standard 754floating point number. this isn't a very useful format for computer software and audiohardware that can't directly deal with such floating point values. for this reason, youmay wish to use the following convertfloat() hack to convert the floating point value toan unsigned long. this function doesn't handle very large floating point values, butcertainly larger than you would ever expect a typical sampling rate to be. this functionassumes that the passed buffer arg is a pointer to the 10 byte array whichalready contains the 80-bit floating point value. it returns the value (ie, sample rate inhertz) as an unsigned long. note that the fliplong() function is only of use to intelcpu's which have to deal with aiff's big endian order. if not compiling for an intel cpu,comment out the intel_cpu define.

#define intel_cpu

#ifdef intel_cpu
/* *************************** fliplong() ******************************
 * converts a long in "big endian" format (ie, motorola 68000) to intel
 * reverse-byte format, or vice versa if originally in big endian.
 ********************************************************************* */

void fliplong(unsigned char * ptr)
{
   register unsigned char val;

   /* swap 1st and 4th bytes */
   val = *(ptr);
   *(ptr) = *(ptr+3);
   *(ptr+3) = val;

   /* swap 2nd and 3rd bytes */
   ptr += 1;
   val = *(ptr);
   *(ptr) = *(ptr+1);
   *(ptr+1) = val;
}
#endif

/* ************************* fetchlong() *******************************
 * fools the compiler into fetching a long from a char array.
 ********************************************************************* */

unsigned long fetchlong(unsigned long * ptr)
{
   return(*ptr);
}

/* ************************* convertfloat() *****************************
 * converts an 80 bit ieee standard 754 floating point number to an unsigned
 * long.
 ********************************************************************** */

unsigned long convertfloat(unsigned char * buffer)
{
   unsigned long mantissa;
   unsigned long last = 0;
   unsigned char exp;

#ifdef intel_cpu
   fliplong((unsigned long *)(buffer+2));
#endif

   mantissa = fetchlong((unsigned long *)(buffer+2));
   exp = 30 - *(buffer+1);
   while (exp--)
   {
     last = mantissa;
     mantissa >>= 1;
   }
   if (last & 0x00000001) mantissa++;
   return(mantissa);
}

of course, you may need a complementary routine to take a sample rate as an unsignedlong, and put it into a buffer formatted as that 80-bit floating point value.

/* ************************* storelong() ******************************
 * fools the compiler into storing a long into a char array.
 ******************************************************************** */

void storelong(unsigned long val, unsigned long * ptr)
{
   *ptr = val;
}

/* ************************** storefloat() ******************************
 * converts an unsigned long to 80 bit ieee standard 754 floating point
 * number.
 ********************************************************************** */

void storefloat(unsigned char * buffer, unsigned long value)
{
   unsigned long exp;
   unsigned char i;

   memset(buffer, 0, 10);

   exp = value;
   exp >>= 1;
   for (i=0; i<32; i++) { exp>>= 1;
      if (!exp) break;
   }
   *(buffer+1) = i;

   for (i=32; i; i--)
   {
      if (value & 0x80000000) break;
      value <<= 1; } storelong(value, buffer+2); #ifdef intel_cpu fliplong((unsigned long *)(buffer+2)); #endif }

multi-sampling

many midi samplers allow splitting up the midi note range into smaller ranges (for example by octaves) and assigning a
different waveform to play over each range. if you wanted to store all of those waveforms into a single data file, what
you would do is create an iff list (or perhaps cat if you wanted to store forms other than aiff in it) and then include an
embedded form aiff for each one of the waveforms. (ie, each waveform would be in a separate form aiff, and all of these
form aiffs would be in a single list file).

content-type: text/html; charset=iso-8859-1; name="new wave riff chunks.htm"
content-disposition: inline; filename="new wave riff chunks.htm"
content-base: "file:///d|/source/sound/format/new%20w
	ave%20riff%20chunks.htm"

x-mime-autoconverted: from 8bit to quoted-printable by skynet.usb.ve id taa05017

new wave riff chunks

added: 05/01/92
author: microsoft, ibm

most of the information in this section comes directly from the ibm/microsoft riff standard document.

the wave form is defined as follows. programs must expect (and ignore) any unknown chunks encountered, as with all riff forms. however, <'fmt'-ck> must always occur before , and both of these chunks are mandatory in a wave file.

?/font>
riff( 'wave'
<'fmt'-ck> // format
[] // fact chunk
[] // cue points
[] // playlist
[] // associated data list
) // wave data

the wave chunks are described in the following sections.

fact chunk

the stores file dependent information about the contentsof the wave file. this chunk is defined as follows:

?/font> fact( )

<dwsamplelength> represents the length of the data in samples. the <nsamplespersec> field from the wave format header is used in conjunction with the <dwsamplelength> field to determine the length of the data in seconds.

the fact chunk is required for all new wave formats. the chunk is not required for the standard wave_format_pcm files.

the fact chunk will be expanded to include any other information required by future wave formats. added fields will appear following the field. applications can use the chunk size field to determine which fields are present.

cue points chunk

the <cue-ck> cue-points chunk identifies a series of positionsin the waveform data stream. the is defined as follows:

?/font> cue( // count of cue points
... ) // cue-point table

?/font> struct {
dword dwname;
dword dwposition;
fourcc fccchunk;
dword dwchunkstart;
dword dwblockstart;
dword dwsampleoffset;
}

the <cue-point> fields are as follows:

field	description
dwname	specifies the cue point name. each record must have a unique dwname field.
dwposition	specifies the sample position of the cue point. this is the sequential sample number within the play order. see "playlist chunk," later in this document, for a discussion of the play order.
fccchunk	specifies the name or chunk id of the chunk containing the cue point.
dwchunkstart	specifies the position of the start of the data chunk containing the cue point. this should be zero if there is only one chunk containing data (as is currently always the case).
dwblockstart	specifies the position of the start of the block containing the position. this is the byte offset from the start of the data section of the chunk, not the chunk's fourcc.
dwsampleoffset	specifies the sample offset of the cue point relative to the start of the block.

examples of file position values

the following table describes the field values for a wavefile containing a single 'data' chunk:

cue point location	field	value
within pcm data	fccchunk	fourcc value 'data'.
	dwchunkstart	zero value.
	dwblockstart	file position of the sample (nblockalign aligned bytes) relative to the start of the data section of the 'data' chunk (not the fourcc).
	dwsampleoffset	sample position of the cue point relative to the start of the 'data' chunk.
in all other 'data' chunks	fccchunk	fourcc value 'data'.
	dwchunkstart	zero value.
	dwblockstart	file position of the enclosing block relative to the start of the data section of the 'data' chunk (not the fourcc). the software can begin the decompression at this point.
	dwsampleoffset	sample position of the cue point relative to the start of the block.

playlist chunk

the playlist chunk specifies a play order for aseries of cue points. the is defined as follows:

?plst(
// count of play segments
... ) // play-segment table

?struct {
dword dwname;
dword dwlength;
dword dwloops;
}

the <play-segment> fields are as follows:

field	description
dwname	specifies the cue point name. this value must match one of the names listed in the cue-point table.
dwlength	specifies the length of the section in samples.
dwloops	specifies the number of times to play the section.

associated data chunk

the associated data list provides the ability to attachinformation like labels to sections of the waveform data stream. the is defined asfollows:

?/font> list( 'adtl'

// label

// note

} // text with data length

?/font> labl(

)

?/font> note(

)

?/font> ltxt(

... )

label and note information

the 'labl' and 'note' chunks have similar fields. the'labl' chunk contains a label, or title, to associate with a cue point. the 'note' chunkcontains comment text for a cue point. the fields are as follows:

field	description
dwname	specifies the cue point name. this value must match one of the names listed in the cue-point table.
data	specifies a null-terminated string containing a text label (for the 'labl' chunk) or comment text (for the 'note' chunk).

text with data length information

the "ltxt" chunk contains text that is associatedwith a data segment of specific length. the chunk fields are as follows:

field	description
dwname	specifies the cue point name. this value must match one of the names listed in the cue-point table.
dwsamplelength	specifies the number of samples in the segment of waveform data.
dwpurpose	specifies the type or purpose of the text. for example, can specify a fourcc code like 'scrp' for script text or 'capt' for close-caption text.
wcountry	specifies the country code for the text. see "country codes" for a current list of country codes.
wlanguage, wdialect	specify the language and dialect codes for the text. see "language and dialect codes" for a current list of language and dialect codes.
wcodepage	specifies the code page for the text.

inst (instrument) chunk

added: 12/29/92
author: ibm
defined for: wave form

the wave form is nearly the perfect file format for storing a sampled sound synthesizer's samples. bits per sample, sample rate, number of channels, and complex looping can be specified with current wave subchunks, but a sample's pitch and its desired volume relative to other samples cannot. the optional instrument subchunk defined below fills in these needed parameters:

|| ?/font> inst(

)

bunshiftednote	the midi note number that corresponds to the unshifted pitch of the sample. valid values range from 0 to 127.
chfinetune	the pitch shift adjustment in cents (or 100ths of a semitone) needed to hit bunshiftednote value exactly. chfinetune can be used to compensate for tuning errors in the sampling process. valid values range from -50 to 50.
chgain	the suggested volume setting for the sample in decibels. a value of zero decibels suggests no change in the volume. a value of -6 decibels suggests reducing the amplitude of the sample by two.
blownote and bhigh note	the suggested usable midi note number range of the sample. valid values range from 0 to 127.
blowvelocity and bhighvelocity	the suggested usable midi velocity range of the sample. valid values range from 0 to 127.

smpl (sample) chunk

added: 11/09/93
author: digidesign, sonic foundary, turtle beach
defined for: wave form

the sampled instrument chunk describes the minimum necessary information needed to allow a sampling keyboard to use a wave file as an instrument. samplers which require more information can save their extended information in the sampler specific data section. the is defined as follows:

|| ?/font> smpl(

)

struct

{

dword dwidentifier;

dword dwtype;

dword dwstart;

dword dwend;

dword dwfraction;

dword dwplaycount;

}

the chunk:

dwmanufacturer	specifies the mma manufacturer code for the intended target device. the high byte indicates the number of low order bytes (1 or 3) that are valid for the manufacturer code. for example, this value will be 0x01000013 for digidesign (the mma manufacturer code is one byte, 0x13); whereas 0x03000041 identifies microsoft (the mma manufacturer code is three bytes, 0x00 0x00 0x41). if the sample is not intended for a specific manufacturer, then this field should be set to zero.
dwproduct	specifies the product code of the intended target device for the dwmanufacturer. if the sample is not intended for a specific manufacturer's product, then this field should be set to zero.
dwsampleperiod	specifies the period of one sample in nanoseconds (normally 1/nsamplespersec from the waveformat structure for the riff wave file--however, this field allows fine tuning). for example, 44.1 khz would be specified as 22675 (0x00005893).
dwmidiunitynote	specifies the midi note which will replay the sample at original pitch. this value ranges from 0 to 127 (a value of 60 represents middle c as defined by the mma).
dwmidipitchfraction	specifies the fraction of a semitone up from the specified dwmidiunitynote. a value of 0x80000000 is 1/2 semitone (50 cents); a value of 0x00000000 represents no fine tuning between semitones.
dwsmpteformat	specifies the smpte time format used in the dwsmpteoffset field. possible values are (unrecognized formats should be ignored): 0 - specifies no smpte offset (dwsmpteoffset should also be zero) 24 - specifies 24 frames per second 25 - specifies 25 frames per second 29 - specifies 30 frames per second with frame dropping ('30 drop') 30 - specifies 30 frames per second
dwsmpteoffset	specifies a time offset for the sample if it is to be syncronized or calibrated according to a start time other than 0. the format of this value is 0xhhmmssff. hh is a signed hours value [-23..23]. mm is an unsigned minutes value [0..59]. ss is unsigned seconds value [0..59]. ff is an unsigned value [0..( - 1)].
csampleloops	specifies the number (count) of records that are contained in the chunk. the records are stored immediately following the cbsamplerdata field.
cbsamplerdata	specifies the size in bytes of the optional . sampler specific data is stored imediately following the records. the cbsamplerdata field will be zero if no extended sampler specific information is stored in the chunk.

the structure:

dwidentifier	identifies the unique 'name' of the loop. this field may correspond with a name stored in the chunk. the name data is stored in the chunk.
dwtype	specifies the loop type: 0 - loop forward (normal) 1 - alternating loop (forward/backward) 2 - loop backward 3-31 - reserved for future standard types 32-? - sampler specific types (manufacturer defined)
dwstart	specifies the startpoint of the loop in samples.
dwend	specifies the endpoint of the loop in samples (this sample will also be played).
dwfraction	allows fine-tuning for loop fractional areas between samples. values range from 0x00000000 to 0xffffffff. a value of 0x80000000 represents 1/2 of a sample length.
dwplaycount	specifies the number of times to play the loop. a value of 0 specifies an infinite sustain loop.