How is MP3 built?-CSDN博客

本文链接：https://blog.csdn.net/wangmj518/article/details/3354766

Most people with a little knowledge in MP3 files know that the sound is divided into smaller parts and compressed with a psycoacoustic model. This smaller pieces of the audio is then put into something called 'frames', which is a little datablock with a header. I'll focus on that header in this text.

The header is 4 bytes, 32 bits, big and begins with something called sync. This sync is, at least according to the MPEG standard, 12 set bits in a row. Some add-on standards made later uses 11 set bits and one cleared bit. The sync is directly followed by a ID bit, indicating if the file is a MPEG-1 och MPEG-2 file. 0=MPEG-2 and 1=MPEG-1

The layer is defined with the two layers bits. They are oddly defined as

00	Not defined
01	Layer III
10	Layer II
11	Layer I

With this information and the information in the bitrate field we can determine the bitrate of the audio (in kbit/s) according to this table.

Bitrate value	MPEG-1, layer I	MPEG-1, layer II	MPEG-1, layer III	MPEG-2, layer I	MPEG-2, layer II	MPEG-2, layer III
0000
0001	32	32	32	32	32	8
0010	64	48	40	64	48	16
0011	96	56	48	96	56	24
0100	128	64	56	128	64	32
0101	160	80	64	160	80	64
0110	192	96	80	192	96	80
0111	224	112	96	224	112	56
1000	256	128	112	256	128	64
1001	288	160	128	288	160	128
1010	320	192	160	320	192	160
1011	352	224	192	352	224	112
1100	384	256	224	384	256	128
1101	416	320	256	416	320	256
1110	448	384	320	448	384	320
1111

The sample rate is described in the frequency field. These values is dependent of which MPEG standard is used according to the following table.

Frequency Value	MPEG-1	MPEG-2
00	44100 Hz	22050 Hz
01	48000 Hz	24000 Hz
10	32000 Hz	16000 Hz
11

Three bits is not needed in the decoding process at all. These are the copyright bit, original home bit and the private bit. The copyright has the same meaning as the copyright bit on CDs and DAT tapes, i.e. telling that it is illegal to copy the contents if the bit is set. The original home bit indicates, if set, that the frame is located on its original media. No one seems to know what the privat bit is good for.

If the protection bit is NOT set then the frame header is followed by a 16 bit checksum, inserted before the audio data. If the padding bit is set then the frame is padded with an extra byte. Knowing this the size of the complete frame can be calculated with the following formula

FrameSize = 144 * BitRate / SampleRate

when the padding bit is cleared and

FrameSize = (144 * BitRate / SampleRate) + 1

when the padding bit is set.

The !frameSize is of course an integer. If for an example BitRate=128000, SampleRate=44100 and the padding bit is cleared, then the FrameSize = 144 * 128000 / 44100 = 417

The mode field is used to tell which sort of stereo/mono encoding that has been used. The purpose of the mode extension field is different for different layers, but I really don't know exactly what it's for.

Mode value	mode
00	Stereo
01	Joint stereo
10	Dual channel
11	Mono

The last field is the emphasis field. It is used to sort of 're-equalize' the sound after a Dolby-like noise supression. This is not very used and will probably never be. The following noise supression model is used