系列文章目录
前言
在 基于 CoreAudio 的音频编解码(一):音频解码 中,我们介绍了 Core Audio 中常见的数据结构和基本概念,如果你还没有看过这些内容,最好去看一看。
Core Audio 表示音频的数据的方式并不是告诉你 ”hi,这是个 mp3 文件“ 那么简单。文件格式和文件内的音频数据格式之间有很大的区别。
关于格式的很多内容看起来似乎很随意,但 Audio File Services 提供了一个有趣函数,叫做 AudioFileGetGlobalInfo
,它给出的信息不是关于单个文件,而是关于 Core Audio 对音频文件的总体处理。下面是 AudioFileGetGlobalInfo
可以查询的信息:
kAudioFileGlobalInfo_ReadableTypes
kAudioFileGlobalInfo_WritableTypes
kAudioFileGlobalInfo_FileTypeName
kAudioFileGlobalInfo_AvailableStreamDescriptionsForFormat
kAudioFileGlobalInfo_AvailableFormatIDs
kAudioFileGlobalInfo_AllExtensions
kAudioFileGlobalInfo_AllHFSTypeCodes
kAudioFileGlobalInfo_AllUTIs
kAudioFileGlobalInfo_AllMIMETypes
kAudioFileGlobalInfo_ExtensionsForType
kAudioFileGlobalInfo_HFSTypeCodesForType
kAudioFileGlobalInfo_UTIsForType
kAudioFileGlobalInfo_MIMETypesForType
kAudioFileGlobalInfo_TypesForMIMEType
kAudioFileGlobalInfo_TypesForUTI
kAudioFileGlobalInfo_TypesForHFSTypeCode
kAudioFileGlobalInfo_TypesForExtension
例如 kAudioFileGlobalInfo_AvailableFormatIDs
,当给定文件类型(AudioFileTypeID),它返回一组 FormatID
,表示当前文件类型所支持的数据格式。
下面举个例子,展示如何使用 AudioFileGetGlobalInfo
获取想要的信息。假设我们想知道当文件类型是 kAudioFileMPEG4Type
时,所支持的格式有哪些,我们可以这么做:
OSStatus err;
UInt32 file_type = kAudioFileMPEG4Type;
UInt32 size;
err = AudioFileGetGlobalInfoSize(kAudioFileGlobalInfo_AvailableFormatIDs,
sizeof(UInt32),
&file_type,
&size);
auto* formats = (UInt32*)malloc(size);
err = AudioFileGetGlobalInfo(kAudioFileGlobalInfo_AvailableFormatIDs,
sizeof(UInt32),
&file_type,
&size,
formats);
int format_cnt = size / sizeof(UInt32);
for(int i = 0; i < format_cnt; ++i){
UInt32 format4cc = CFSwapInt32HostToBig(formats[i]);
cout << i << ": mFormatId: " << (char*)(&format4cc);
}
代码输出了十几项,kAudioFileMPEG4Type
所支持的格式类型相当丰富。
0: mFormatId: .mp1
1: mFormatId: .mp2
2: mFormatId: .mp3
3: mFormatId: aac
4: mFormatId: aace
5: mFormatId: aacf
6: mFormatId: aacg
7: mFormatId: aach
8: mFormatId: aac
9: mFormatId: aacp
10: mFormatId: ac-3
11: mFormatId: alac
12: mFormatId: ec-3
13: mFormatId: usac
如果是 kAudioFileAIFFType
呢?它支持一种格式:
0: mFormatId: lpcm
举另一个例子,kAudioFileGlobalInfo_AvailableStreamDescriptionsForFormat
,当给定文件类型(AudioFileTypeID)和格式类型,它返回一组 AudioStreamBasicDescription
并填写以下字段:mFormatID、mFormatFlags、mBitsPerChannel。这些信息对于写入文件非常有帮助,毕竟你肯定不想去茫茫文档中找寻这些信息。
AudioFileTypeAndFormatID file_type_and_format_id;
file_type_and_format_id.mFileType = kAudioFileAIFFType;
file_type_and_format_id.mFormatID = kAudioFormatLinearPCM;
err = AudioFileGetGlobalInfoSize(kAudioFileGlobalInfo_AvailableStreamDescriptionsForFormat,
sizeof(file_type_and_format_id),
&file_type_and_format_id,
&size);
auto *asbds = (AudioStreamBasicDescription*)malloc(size);
err = AudioFileGetGlobalInfo(kAudioFileGlobalInfo_AvailableStreamDescriptionsForFormat,
sizeof(file_type_and_format_id),
&file_type_and_format_id,
&size,
asbds);
int asbd_count = size / sizeof(AudioStreamBasicDescription);
for(int i = 0; i < asbd_count; ++i){
UInt32 format4cc = CFSwapInt32HostToBig(asbds[i].mFormatID);
cout << i << ": mFormatId: " << (char*)(&format4cc)
<< ", mFormatFlags: " << asbds[i].mFormatFlags
<< ", mChannelsPerFrame: " << asbds[i].mChannelsPerFrame
<< ", mBytesPerFrame: " << asbds[i].mBytesPerFrame
<< ", mBitsPerChannel: " << asbds[i].mBitsPerChannel << endl;
}
上述代码中,指定文件类型为 kAudioFileAIFFType
,数据格式为 kAudioFormatLinearPCM
,输出为:
0: mFormatId: lpcm, mFormatFlags: 14, mBitsPerChannel: 8
1: mFormatId: lpcm, mFormatFlags: 14, mBitsPerChannel: 16
2: mFormatId: lpcm, mFormatFlags: 14, mBitsPerChannel: 24
3: mFormatId: lpcm, mFormatFlags: 14, mBitsPerChannel: 32
其输出为表明了它支持 8、16、24、32位数据,其 mFormatFlags = 14
表示 0x2 + 0x4 + 0x8
,即
kAudioFormatFlagIsBigEndian | kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked
音频编码
在前言部分,我们介绍了如何利用 AudioFileGetGlobalInfo
获取信息,这在音频编码过程中非常重要,因为编码时遵循以下几个步骤:
- 确定文件类型。你想要的文件是啥类型的?wav,aiff 还是 aac 呢?
- 确定格式类型。不同的文件类型支持的数据格式不同,可以通过
AudioFileGetGlobalInfo
和kAudioFileGlobalInfo_AvailableFormatIDs
确定 - 合适的
mFormatFlags
和mBitsPerChannel
。确定合适的 flags 和 bits 能够确保打开文件时不会出错,可以通过AudioFileGetGlobalInfo
和kAudioFileGlobalInfo_AvailableStreamDescriptionsForFormat
来确定。
Show me the code
废话不多说,直接上代码,具体解释在代码后面。
int main(int argc, char* argv[])
{
AudioFileTypeID file_type = kAudioFileMPEG4Type;
int o_channels = 2;
double o_sr = 44100;
AudioStreamBasicDescription output_asbd;
memset(&output_asbd, 0, sizeof(output_asbd));
output_asbd.mSampleRate = o_sr;
output_asbd.mChannelsPerFrame = o_channels;
output_asbd.mFormatID = kAudioFormatMPEG4AAC;
AudioFormatGetProperty(kAudioFormatProperty_FormatInfo, 0, NULL, &size, &output_asbd);
// open output file
CFURLRef output_url = createCFURLWithStdString("sin440.aac");
ExtAudioFileRef output_file;
OSStatus status = ExtAudioFileCreateWithURL(output_url,file_type,
&output_asbd, nullptr,
kAudioFileFlags_EraseFile,
&output_file);
assert(status == noErr);
double i_sr = 44100;
double i_channels = 2;
AudioStreamBasicDescription input_asbd;
FillOutASBDForLPCM (input_asbd,i_sr,i_channels,32,32,true,false,false);
status = ExtAudioFileSetProperty(output_file, kExtAudioFileProperty_ClientDataFormat,
sizeof(input_asbd), &input_asbd);
assert(status == noErr);
const int num_frame_out_per_block = 1024;
AudioBufferList outputData;
outputData.mNumberBuffers = 1;
outputData.mBuffers[0].mNumberChannels = i_channels;
outputData.mBuffers[0].mDataByteSize = sizeof(float)*num_frame_out_per_block*i_channels;
std::vector<float> buffer(num_frame_out_per_block * i_channels);
outputData.mBuffers[0].mData = buffer.data();
float t = 0;
float tincr = 2 * M_PI * 440.0f / i_sr;
for(int i = 0; i < 200; ++i){
for(int j = 0; j < num_frame_out_per_block; ++j){
buffer[j * i_channels] = sin(t);
buffer[j * i_channels + 1] = buffer[j * i_channels];
t += tincr;
}
// write audio block
status = ExtAudioFileWrite(output_file, num_frame_out_per_block, &outputData);
assert(status == noErr);
}
ExtAudioFileDispose(output_file);
return 0;
}
首先,我们创建 AudioStreamBasicDescription
,并指定其文件类型为 kAudioFileMPEG4Type
,以及采样率、声道数和数据格式。其他部分通通置零,然后调用AudioFormatGetProperty
来填充其他信息, 但如果是 kAudioFormatLinearPCM
,你最好应该使用 FillOutASBDForLPCM
来填充信息。
AudioFileTypeID file_type = kAudioFileMPEG4Type;
AudioStreamBasicDescription output_asbd;
memset(&output_asbd, 0, sizeof(output_asbd));
output_asbd.mSampleRate = o_sr;
output_asbd.mChannelsPerFrame = o_channels;
output_asbd.mFormatID = kAudioFormatMPEG4AAC;
AudioFormatGetProperty(kAudioFormatProperty_FormatInfo, 0, NULL, &size, &asbd);
接着,通过 ExtAudioFileCreateWithURL
创建并打开文件,其中 kAudioFileFlags_EraseFile
表示将覆盖已有文件进行创建。
CFURLRef output_url = createCFURLWithStdString("sin440.flac");
ExtAudioFileRef output_file;
OSStatus status = ExtAudioFileCreateWithURL(output_url,file_type,
&output_asbd, nullptr,
kAudioFileFlags_EraseFile,
&output_file);
接下来一步非常重要,通过 ExtAudioFileSetProperty
设置 client format,表明编码文件时,输入的音频数据格式是咋样的。在这里例子中,我们输入的音频数据格式为,双声道的interleave float。
AudioStreamBasicDescription input_asbd;
FillOutASBDForLPCM (input_asbd,i_sr,i_channels,32,32,true,false,false);
status = ExtAudioFileSetProperty(output_file, kExtAudioFileProperty_ClientDataFormat,
sizeof(input_asbd), &input_asbd);
然后是创建 AudioBufferList
用于存放音频数据。由于是 interleave float,因此 mNumberBuffers = 1
。
const int num_frame_out_per_block = 1024;
AudioBufferList outputData;
outputData.mNumberBuffers = 1;
outputData.mBuffers[0].mNumberChannels = i_channels;
outputData.mBuffers[0].mDataByteSize = sizeof(float)*num_frame_out_per_block*i_channels;
std::vector<float> buffer(num_frame_out_per_block * i_channels);
outputData.mBuffers[0].mData = buffer.data();
接下来进行音频数据的写入,示例中写入的是 440hz 的正弦波。
float t = 0;
float tincr = 2 * M_PI * 440.0f / i_sr;
for(int i = 0; i < 200; ++i){
for(int j = 0; j < num_frame_out_per_block; ++j){
buffer[j * i_channels] = sin(t);
buffer[j * i_channels + 1] = buffer[j * i_channels];
t += tincr;
}
// write audio block
status = ExtAudioFileWrite(output_file, num_frame_out_per_block, &outputData);
}
最后不要忘记释放资源。
ExtAudioFileDispose(output_file);
Q&A
如果输入数据是 Planar 格式的要如何处理?
当 kAudioFormatFlagIsNonInterleaved
为 true
时,表示数据是 planar 格式,对此它有一段特别的注释说明
// Typically, when an ASBD is being used, the fields describe the complete layout
// of the sample data in the buffers that are represented by this description -
// where typically those buffers are represented by an AudioBuffer that is
// contained in an AudioBufferList.
//
// However, when an ASBD has the kAudioFormatFlagIsNonInterleaved flag, the
// AudioBufferList has a different structure and semantic. In this case, the ASBD
// fields will describe the format of ONE of the AudioBuffers that are contained in
// the list, AND each AudioBuffer in the list is determined to have a single (mono)
// channel of audio data. Then, the ASBD's mChannelsPerFrame will indicate the
// total number of AudioBuffers that are contained within the AudioBufferList -
// where each buffer contains one channel. This is used primarily with the
// AudioUnit (and AudioConverter) representation of this list - and won't be found
// in the AudioHardware usage of this structure.
这时候的 AudioBufferLists 的语义发生了变换,使用方式大致如下:
int i_channels = 2;
const int num_frame_out_per_block = 1024;
AudioBufferList *outputData = (AudioBufferList*)malloc(sizeof(AudioBufferList) + (sizeof(AudioBuffer) * (i_channels - 1)));
// if input_asbd inIsNonInterleaved is true(planar data), mNumberBuffers set to number of channels
outputData->mNumberBuffers = i_channels;
for(auto i = 0; i < i_channels; ++i){
outputData->mBuffers[i].mNumberChannels = 1;
outputData->mBuffers[i].mDataByteSize = sizeof(float) * num_frame_out_per_block;
outputData->mBuffers[i].mData = new float[num_frame_out_per_block];
}
总结
使用 Core Audio 进行音频文件编码,最重要的是找到合适 AudioStreamBasicDescription
。通过 AudioFileGetGlobalInfo
,可以从文件类型出发,找到合适的数据格式,最后在找到合适的 AudioStreamBasicDescription
。之后的工作只要交给 ExtAudioFile
就能够简洁高效的完成。