基于 CoreAudio 的音频编解码（二）：音频编码

最新推荐文章于 2024-08-23 21:54:09 发布

芥末的无奈

最新推荐文章于 2024-08-23 21:54:09 发布

阅读量1k

点赞数

分类专栏： c++ 音频处理文章标签： core audio 音频编解码

本文链接：https://blog.csdn.net/weiwei9363/article/details/110083555

版权

音频处理同时被 2 个专栏收录

28 篇文章 47 订阅

订阅专栏

c++

20 篇文章 9 订阅

订阅专栏

本文深入探讨了CoreAudio如何处理音频文件格式，包括利用AudioFileGetGlobalInfo获取文件类型支持的格式信息。通过示例展示了如何使用这些信息进行音频编码，详细阐述了编码过程的关键步骤，如确定文件类型、格式和数据格式。最后，给出了一个创建AAC音频文件的代码示例，强调了设置AudioStreamBasicDescription的重要性。

摘要由CSDN通过智能技术生成

系列文章目录

基于 CoreAudio 的音频编解码（一）：音频解码
基于 CoreAudio 的音频编解码（二）：音频编码

前言

在基于 CoreAudio 的音频编解码（一）：音频解码中，我们介绍了 Core Audio 中常见的数据结构和基本概念，如果你还没有看过这些内容，最好去看一看。

Core Audio 表示音频的数据的方式并不是告诉你 ”hi，这是个 mp3 文件“ 那么简单。文件格式和文件内的音频数据格式之间有很大的区别。

关于格式的很多内容看起来似乎很随意，但 Audio File Services 提供了一个有趣函数，叫做 AudioFileGetGlobalInfo，它给出的信息不是关于单个文件，而是关于 Core Audio 对音频文件的总体处理。下面是 AudioFileGetGlobalInfo 可以查询的信息：

kAudioFileGlobalInfo_ReadableTypes					
kAudioFileGlobalInfo_WritableTypes					
kAudioFileGlobalInfo_FileTypeName					
kAudioFileGlobalInfo_AvailableStreamDescriptionsForFormat
kAudioFileGlobalInfo_AvailableFormatIDs				

kAudioFileGlobalInfo_AllExtensions					
kAudioFileGlobalInfo_AllHFSTypeCodes				
kAudioFileGlobalInfo_AllUTIs						
kAudioFileGlobalInfo_AllMIMETypes					

kAudioFileGlobalInfo_ExtensionsForType				
kAudioFileGlobalInfo_HFSTypeCodesForType			
kAudioFileGlobalInfo_UTIsForType					
kAudioFileGlobalInfo_MIMETypesForType				

kAudioFileGlobalInfo_TypesForMIMEType				
kAudioFileGlobalInfo_TypesForUTI					
kAudioFileGlobalInfo_TypesForHFSTypeCode			
kAudioFileGlobalInfo_TypesForExtension

例如 kAudioFileGlobalInfo_AvailableFormatIDs，当给定文件类型(AudioFileTypeID），它返回一组 FormatID，表示当前文件类型所支持的数据格式。

下面举个例子，展示如何使用 AudioFileGetGlobalInfo 获取想要的信息。假设我们想知道当文件类型是 kAudioFileMPEG4Type 时，所支持的格式有哪些，我们可以这么做：

OSStatus err;
UInt32 file_type = kAudioFileMPEG4Type;
UInt32 size;
err = AudioFileGetGlobalInfoSize(kAudioFileGlobalInfo_AvailableFormatIDs,
                           sizeof(UInt32),
                           &file_type,
                           &size);

auto* formats = (UInt32*)malloc(size);
err = AudioFileGetGlobalInfo(kAudioFileGlobalInfo_AvailableFormatIDs,
                             sizeof(UInt32),
                             &file_type,
                             &size,
                             formats);

int format_cnt = size / sizeof(UInt32);
for(int i = 0; i < format_cnt; ++i){
    UInt32 format4cc = CFSwapInt32HostToBig(formats[i]);
    cout << i << ": mFormatId: " << (char*)(&format4cc);
}

代码输出了十几项，kAudioFileMPEG4Type 所支持的格式类型相当丰富。

0: mFormatId: .mp1
1: mFormatId: .mp2
2: mFormatId: .mp3
3: mFormatId: aac 
4: mFormatId: aace
5: mFormatId: aacf
6: mFormatId: aacg
7: mFormatId: aach
8: mFormatId: aac
9: mFormatId: aacp	
10: mFormatId: ac-3
11: mFormatId: alac
12: mFormatId: ec-3
13: mFormatId: usac

如果是 kAudioFileAIFFType 呢？它支持一种格式：

0: mFormatId: lpcm

举另一个例子，kAudioFileGlobalInfo_AvailableStreamDescriptionsForFormat，当给定文件类型(AudioFileTypeID）和格式类型，它返回一组 AudioStreamBasicDescription 并填写以下字段：mFormatID、mFormatFlags、mBitsPerChannel。这些信息对于写入文件非常有帮助，毕竟你肯定不想去茫茫文档中找寻这些信息。

AudioFileTypeAndFormatID  file_type_and_format_id;
file_type_and_format_id.mFileType = kAudioFileAIFFType;
file_type_and_format_id.mFormatID = kAudioFormatLinearPCM;

err = AudioFileGetGlobalInfoSize(kAudioFileGlobalInfo_AvailableStreamDescriptionsForFormat,
                                 sizeof(file_type_and_format_id),
                                 &file_type_and_format_id,
                                 &size);

auto  *asbds = (AudioStreamBasicDescription*)malloc(size);
err = AudioFileGetGlobalInfo(kAudioFileGlobalInfo_AvailableStreamDescriptionsForFormat,
                             sizeof(file_type_and_format_id),
                             &file_type_and_format_id,
                             &size,
                             asbds);

int asbd_count = size / sizeof(AudioStreamBasicDescription);

for(int i = 0; i < asbd_count; ++i){
    UInt32 format4cc = CFSwapInt32HostToBig(asbds[i].mFormatID);
    cout << i << ": mFormatId: " << (char*)(&format4cc)
         << ", mFormatFlags: " << asbds[i].mFormatFlags
         << ", mChannelsPerFrame: " << asbds[i].mChannelsPerFrame
         << ", mBytesPerFrame: " << asbds[i].mBytesPerFrame
         << ", mBitsPerChannel: " << asbds[i].mBitsPerChannel << endl;
}

上述代码中，指定文件类型为 kAudioFileAIFFType，数据格式为 kAudioFormatLinearPCM，输出为：

0: mFormatId: lpcm, mFormatFlags: 14, mBitsPerChannel: 8
1: mFormatId: lpcm, mFormatFlags: 14, mBitsPerChannel: 16
2: mFormatId: lpcm, mFormatFlags: 14, mBitsPerChannel: 24
3: mFormatId: lpcm, mFormatFlags: 14, mBitsPerChannel: 32

其输出为表明了它支持 8、16、24、32位数据，其 mFormatFlags = 14 表示 0x2 + 0x4 + 0x8，即

kAudioFormatFlagIsBigEndian | kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked

音频编码

在前言部分，我们介绍了如何利用 AudioFileGetGlobalInfo 获取信息，这在音频编码过程中非常重要，因为编码时遵循以下几个步骤：

确定文件类型。你想要的文件是啥类型的？wav，aiff 还是 aac 呢？
确定格式类型。不同的文件类型支持的数据格式不同，可以通过 AudioFileGetGlobalInfo 和 kAudioFileGlobalInfo_AvailableFormatIDs 确定
合适的 mFormatFlags 和 mBitsPerChannel。确定合适的 flags 和 bits 能够确保打开文件时不会出错，可以通过 AudioFileGetGlobalInfo 和 kAudioFileGlobalInfo_AvailableStreamDescriptionsForFormat 来确定。

Show me the code

废话不多说，直接上代码，具体解释在代码后面。

int main(int argc, char* argv[])
{
    AudioFileTypeID file_type = kAudioFileMPEG4Type;
    int o_channels = 2;
    double o_sr = 44100;

    AudioStreamBasicDescription output_asbd;
    memset(&output_asbd, 0, sizeof(output_asbd));
    output_asbd.mSampleRate = o_sr;
    output_asbd.mChannelsPerFrame = o_channels;
    output_asbd.mFormatID = kAudioFormatMPEG4AAC;
    AudioFormatGetProperty(kAudioFormatProperty_FormatInfo, 0, NULL, &size, &output_asbd);

    // open output file
    CFURLRef output_url = createCFURLWithStdString("sin440.aac");
    ExtAudioFileRef output_file;
    OSStatus status = ExtAudioFileCreateWithURL(output_url,file_type,
                                                &output_asbd, nullptr,
                                                kAudioFileFlags_EraseFile,
                                                &output_file);
    assert(status == noErr);
    double i_sr = 44100;
    double i_channels = 2;
    AudioStreamBasicDescription input_asbd;
    FillOutASBDForLPCM (input_asbd,i_sr,i_channels,32,32,true,false,false);
    status = ExtAudioFileSetProperty(output_file, kExtAudioFileProperty_ClientDataFormat,
                                     sizeof(input_asbd), &input_asbd);

    assert(status == noErr);

    const int num_frame_out_per_block = 1024;
    AudioBufferList outputData;
    outputData.mNumberBuffers = 1;
    outputData.mBuffers[0].mNumberChannels = i_channels;
    outputData.mBuffers[0].mDataByteSize = sizeof(float)*num_frame_out_per_block*i_channels;
    std::vector<float> buffer(num_frame_out_per_block * i_channels);
    outputData.mBuffers[0].mData = buffer.data();


    float t = 0;
    float tincr = 2 * M_PI * 440.0f / i_sr;
    for(int i = 0; i < 200; ++i){
        for(int j = 0; j < num_frame_out_per_block; ++j){
            buffer[j * i_channels] = sin(t);
            buffer[j * i_channels + 1] = buffer[j * i_channels];

            t += tincr;
        }

        // write audio block
        status = ExtAudioFileWrite(output_file, num_frame_out_per_block, &outputData);

        assert(status == noErr);
    }

    ExtAudioFileDispose(output_file);

    return 0;
}

首先，我们创建 AudioStreamBasicDescription，并指定其文件类型为 kAudioFileMPEG4Type，以及采样率、声道数和数据格式。其他部分通通置零，然后调用AudioFormatGetProperty来填充其他信息，但如果是 kAudioFormatLinearPCM，你最好应该使用 FillOutASBDForLPCM 来填充信息。

AudioFileTypeID file_type = kAudioFileMPEG4Type;

AudioStreamBasicDescription output_asbd;
memset(&output_asbd, 0, sizeof(output_asbd));
output_asbd.mSampleRate = o_sr;
output_asbd.mChannelsPerFrame = o_channels;
output_asbd.mFormatID = kAudioFormatMPEG4AAC;
AudioFormatGetProperty(kAudioFormatProperty_FormatInfo, 0, NULL, &size, &asbd);

接着，通过 ExtAudioFileCreateWithURL 创建并打开文件，其中 kAudioFileFlags_EraseFile 表示将覆盖已有文件进行创建。

CFURLRef output_url = createCFURLWithStdString("sin440.flac");
ExtAudioFileRef output_file;
OSStatus status = ExtAudioFileCreateWithURL(output_url,file_type,
                                            &output_asbd, nullptr,
                                            kAudioFileFlags_EraseFile,
                                            &output_file);

接下来一步非常重要，通过 ExtAudioFileSetProperty 设置 client format，表明编码文件时，输入的音频数据格式是咋样的。在这里例子中，我们输入的音频数据格式为，双声道的interleave float。

AudioStreamBasicDescription input_asbd;
FillOutASBDForLPCM (input_asbd,i_sr,i_channels,32,32,true,false,false);
status = ExtAudioFileSetProperty(output_file, kExtAudioFileProperty_ClientDataFormat,
                                 sizeof(input_asbd), &input_asbd);

然后是创建 AudioBufferList 用于存放音频数据。由于是 interleave float，因此 mNumberBuffers = 1。

const int num_frame_out_per_block = 1024;
AudioBufferList outputData;
outputData.mNumberBuffers = 1;
outputData.mBuffers[0].mNumberChannels = i_channels;
outputData.mBuffers[0].mDataByteSize = sizeof(float)*num_frame_out_per_block*i_channels;
std::vector<float> buffer(num_frame_out_per_block * i_channels);
outputData.mBuffers[0].mData = buffer.data();

接下来进行音频数据的写入，示例中写入的是 440hz 的正弦波。

float t = 0;
float tincr = 2 * M_PI * 440.0f / i_sr;
for(int i = 0; i < 200; ++i){
    for(int j = 0; j < num_frame_out_per_block; ++j){
        buffer[j * i_channels] = sin(t);
        buffer[j * i_channels + 1] = buffer[j * i_channels];

        t += tincr;
    }

    // write audio block
    status = ExtAudioFileWrite(output_file, num_frame_out_per_block, &outputData);
}

最后不要忘记释放资源。

ExtAudioFileDispose(output_file);

Q&A

如果输入数据是 Planar 格式的要如何处理？

当 kAudioFormatFlagIsNonInterleaved 为 true 时，表示数据是 planar 格式，对此它有一段特别的注释说明

//    Typically, when an ASBD is being used, the fields describe the complete layout
//    of the sample data in the buffers that are represented by this description -
//        where typically those buffers are represented by an AudioBuffer that is
//    contained in an AudioBufferList.
//
//        However, when an ASBD has the kAudioFormatFlagIsNonInterleaved flag, the
//    AudioBufferList has a different structure and semantic. In this case, the ASBD
//    fields will describe the format of ONE of the AudioBuffers that are contained in
//    the list, AND each AudioBuffer in the list is determined to have a single (mono)
//    channel of audio data. Then, the ASBD's mChannelsPerFrame will indicate the
//    total number of AudioBuffers that are contained within the AudioBufferList -
//        where each buffer contains one channel. This is used primarily with the
//    AudioUnit (and AudioConverter) representation of this list - and won't be found
//    in the AudioHardware usage of this structure.

这时候的 AudioBufferLists 的语义发生了变换，使用方式大致如下：

    int i_channels = 2;
    const int num_frame_out_per_block = 1024;
    AudioBufferList *outputData = (AudioBufferList*)malloc(sizeof(AudioBufferList) + (sizeof(AudioBuffer) * (i_channels - 1)));

    // if input_asbd inIsNonInterleaved is true(planar data), mNumberBuffers set to number of channels
    outputData->mNumberBuffers = i_channels;
    for(auto i = 0; i < i_channels; ++i){
        outputData->mBuffers[i].mNumberChannels = 1;
        outputData->mBuffers[i].mDataByteSize = sizeof(float) * num_frame_out_per_block;
        outputData->mBuffers[i].mData = new float[num_frame_out_per_block];
    }

总结

使用 Core Audio 进行音频文件编码，最重要的是找到合适 AudioStreamBasicDescription。通过 AudioFileGetGlobalInfo，可以从文件类型出发，找到合适的数据格式，最后在找到合适的 AudioStreamBasicDescription。之后的工作只要交给 ExtAudioFile 就能够简洁高效的完成。

完整代码在 CoreAudioExtAudioFileExample