解码aac编码格式音频转pcm(使用faad2库)
阅读本博客之前,可选阅读《音频相关基础知识》:https://blog.csdn.net/qq_41824928/article/details/108124382
1.使用faad2库
使用faad2库解码aac为pcm时,主要使用以下几个函数:
NeAACDecHandle NeAACDecOpen(void);
void NeAACDecClose(NeAACDecHandle hDecoder);
NeAACDecConfigurationPtr NeAACDecGetCurrentConfiguration(NeAACDecHandle hDecoder);
unsigned char NeAACDecSetConfiguration(NeAACDecHandle hDecoder, NeAACDecConfigurationPtr config);
long NeAACDecInit(NeAACDecHandle hDecoder, unsigned char *buffer, unsigned long buffer_size, unsigned long *samplerate, unsigned char *channels);
void* NeAACDecDecode(NeAACDecHandle hDecoder, NeAACDecFrameInfo *hInfo, unsigned char *buffer, unsigned long buffer_size);
我们从上往下进行介绍:
1.1 打开和关闭handle
typedef void *NeAACDecHandle;
创建一个用于aac解码的handle, NeAACDecHandle类型是void*, 所以在使用结束后,需要关闭这个句柄释放资源。
NeAACDecHandle NeAACDecOpen(void);
关闭一个用于aac解码的handle
void NeAACDecClose(NeAACDecHandle hDecoder);
1.2获取和设置解码参数
typedef struct NeAACDecConfiguration
{
unsigned char defObjectType;
unsigned long defSampleRate;
unsigned char outputFormat;
unsigned char downMatrix;
unsigned char useOldADTSFormat;
unsigned char dontUpSampleImplicitSBR;
} NeAACDecConfiguration, *NeAACDecConfigurationPtr;
获取当前的解码参数
NeAACDecConfigurationPtr NeAACDecGetCurrentConfiguration(NeAACDecHandle hDecoder);
设置解码参数
unsigned char NeAACDecSetConfiguration(NeAACDecHandle hDecoder, NeAACDecConfigurationPtr config);
设置解码参数时,主要设置以下4个参数defSampleRate,defObjectType,outputFormat,dontUpSampleImplicitSBR(其实defSampleRate 和 defObjectType随意设置即可)
outputFormat: 位深度的设置。位深度概念可看整理转载的博客《音频相关基础知识》: https://blog.csdn.net/qq_41824928/article/details/108124382
dontUpSampleImplicitSBR设置成1,可以禁止NeAACDecInit时,采样率(小于等于24000时)翻倍的问题,NeAACDecInit涉及这一块的源码部分如下:
if (*samplerate <= 24000 && (hDecoder->config.dontUpSampleImplicitSBR == 0))
{
*samplerate *= 2;
hDecoder->forceUpSampling = 1;
} else if (*samplerate > 24000 && (hDecoder->config.dontUpSampleImplicitSBR == 0)) {
hDecoder->downSampledSBR = 1;
}
1.3初始化解码器
/*
* @brief 用aac的adts/adif头初始化解码器。只用初始化一次!
* @param hDecoder[in] NeAACDecOpen生成的handle
* buffer[in] aac数据的起始地址
* buffer_size[in] aac数据长度
* samplerate[out] 采样率
* channels[out] 通道数
* @return 失败返回-1
*/
long NeAACDecInit(NeAACDecHandle hDecoder,
unsigned char *buffer,
unsigned long buffer_size,
unsigned long *samplerate,
unsigned char *channels);
1.4解码aac为pcm
typedef struct NeAACDecFrameInfo
{
unsigned long bytesconsumed;
unsigned long samples;
unsigned char channels;
unsigned char error;
unsigned long samplerate;
/* SBR: 0: off, 1: on; upsample, 2: on; downsampled, 3: off; upsampled */
unsigned char sbr;
/* MPEG-4 ObjectType */
unsigned char object_type;
/* AAC header type; MP4 will be signalled as RAW also */
unsigned char header_type;
/* multichannel configuration */
unsigned char num_front_channels;
unsigned char num_side_channels;
unsigned char num_back_channels;
unsigned char num_lfe_channels;
unsigned char channel_position[64];
/* PS: 0: off, 1: on */
unsigned char ps;
} NeAACDecFrameInfo;
/*
* @brief aac解码为pcm
* @param hDecoder[in] NeAACDecOpen生成的handle
* hInfo[out] 解码获取到的参数信息
* buffer[in] aac数据的起始地址
* buffer_size[in] aac数据长度
* @return 成功返回pcm的起始地址(长度需要自己计算),失败返回NULL,并且可以从hInfo->error中获得错误码
*/
void* NeAACDecDecode(NeAACDecHandle hDecoder,
NeAACDecFrameInfo *hInfo,
unsigned char *buffer,
unsigned long buffer_size);
NeAACDecDecode解码成功得到pcm的起始地址的指针,但是长度没有给出,需要我们自己进行计算,计算解码后pcm的长度可看整理转载的博客《音频相关基础知识》: https://blog.csdn.net/qq_41824928/article/details/108124382 最下方红色加粗的部分。
2.切分AAC帧数据
因为faad2库的解码函数NeAACDecDecode,需要的参数是aac的帧数据,不满一帧会解码失败,超过一帧不满两帧,只会解码一帧的数据,后面的数据就会被丢弃掉。
因此,在使用NeAACDecDecode之前,我们需要把aac音频做切割,分裂成一帧一帧的数据。aac(adts头)是以syncword作为起始码的(值一定是0xfff),layer固定为('00'),我们可以通过syncword来进行分割aac数据。
如果你使用的是srs_librtmp拉aac音频流,则可以跳过这里。因为在调用srs_rtmp_read_packet时获取到的音频数据,都是完整的一帧。
aac的adts头详解,以及srs_librtmp拉aac音频流,可以看博客《RTMP拉流保存aac(flv保存为aac)》: https://blog.csdn.net/qq_41824928/article/details/107636845
adts_fixed_header前2个字节的结构如图:
- syncword:帧同步标识一个帧的开始,固定为0xFFF
- ID:MPEG 标示符。0表示MPEG-4,1表示MPEG-2
- layer:固定为'00'
- p_a (protection_absent):标识是否进行误码校验。0表示有CRC校验,1表示没有CRC校验,CRC校验总共2个字节(16bit)
adts_header分了2部分,每部分是28bit,其中表示aac长度的aac_frame_length在第2部分,是13bit,因此我们可以根据syncword, layer来查找aac的起始位置,并通过aac_frame_length来得到aac帧的长度,代码如下:
//aac开头的第一个字节必须是0xff
std::string::size_type pos = aacData.find(0xff, 0);
if (pos == std::string::npos) {
return "";
}
//aac开头的第二个字节的前4位必须是f, 6、7位必须是0
uint8_t code = aacData[pos+1];
if (((code & (uint8_t)0xf0) != 0xf0) || (code & (uint8_t)0x06) != 0) {
return "";
}
//解析得到aac帧长度
uint8_t i = aacData[pos+3];
uint8_t j = aacData[pos+4];
uint8_t k = aacData[pos+5];
uint32_t aacFrameLen = ((i & 0x3) << 9) + (j << 3) + ((k & 0xe0) >> 5);
return aacData.substr(pos, aacFrameLen);
3.总结
1. NeAACDecOpen打开一个句柄,通过NeAACDecGetCurrentConfiguration获取解码参数结构体,通过NeAACDecSetConfiguration设置解码参数。
2. 获取aac帧数据(可根据adts_header进行解析,也可以通过srs_librtmp直接拉取rtmp流获取帧数据)。
3. 调用NeAACDecInit,用aac帧数据初始化句柄NeAACDecHandle。
4. 调用NeAACDecDecode解码aac帧数据为pcm数据。
5. 使用ffplay播放pcm
(例如播放采样率44100HZ,立体声(2通道的),16位深的pcm:ffplay -ar 44100 -channels 2 -f s16le -i xxx.pcm)
4.代码展示
aacToPcm.cpp:
#include "aacToPcm.h"
AacToPcm::AacToPcm(uint64_t maxParseSize)
: _maxParseSize(maxParseSize)
, _lastAacPos(0)
, _aac2pcmInitFlag(false)
, _hDecoder(nullptr)
{
}
AacToPcm::~AacToPcm() {
if (_hDecoder) {
NeAACDecClose(_hDecoder);
_hDecoder = nullptr;
}
}
int AacToPcm::init() {
_hDecoder = NeAACDecOpen();
if (!_hDecoder) {
return -1;
}
NeAACDecConfigurationPtr config = NeAACDecGetCurrentConfiguration(_hDecoder);
config->defSampleRate = 80000;
config->defObjectType = HE_AAC;
config->outputFormat = FAAD_FMT_16BIT; //位深度设置成16bit
// 源码部分如下:
// if (*samplerate <= 24000 && (hDecoder->config.dontUpSampleImplicitSBR == 0))
// {
// *samplerate *= 2;
// hDecoder->forceUpSampling = 1;
// } else if (*samplerate > 24000 && (hDecoder->config.dontUpSampleImplicitSBR == 0)) {
// hDecoder->downSampledSBR = 1;
// }
// dontUpSampleImplicitSBR 设置成1,可以禁止NeAACDecInit时,采样率(小于等于24000时)翻倍的问题
config->dontUpSampleImplicitSBR = 1;
NeAACDecSetConfiguration(_hDecoder, config);
return 0;
}
std::string AacToPcm::aacToPcm(const std::string &aacData) {
if(!_aac2pcmInitFlag) {
unsigned long lSampleRate = 0;
uint8_t nChannel = 0;
if (-1 == NeAACDecInit(_hDecoder, (uint8_t *)aacData.data(), aacData.size(), &lSampleRate, &nChannel)) {
return "";
}
_aac2pcmInitFlag = true;
}
NeAACDecFrameInfo hInfo;
char *pData = nullptr;
pData = (char *)NeAACDecDecode(_hDecoder, &hInfo, (uint8_t *)aacData.data(), aacData.size());
if(hInfo.error > 0 || !pData) {
fprintf(stderr, "NeAACDecDecode error(%d)\r\n", hInfo.error);
return "";
} else {
uint32_t dataSize = 1024 * hInfo.channels * 16/8; //位深度是16bit (FAAD_FMT_16BIT)
return std::string(pData, dataSize);
}
}
int AacToPcm::inputAacData(uint8_t *data, uint32_t len) {
if (_lastAacPos > 0 && _lastAacPos < _aacData.size()) {
_aacData.erase(0, _lastAacPos);
}
_lastAacPos = 0;
//如果数据量 < _maxParseSize, 则append
//反正则抛弃之前的数据
if (len + _aacData.size() < _maxParseSize) {
_aacData.append((const char*)data, len);
} else if (len > _maxParseSize) {
fprintf(stderr, "data Size(%d) is too large to contain\n", len);
} else {
_aacData = std::string((const char*)data, len);
}
if (splitAacData()) {
return -1;
}
for (const auto &aac : _aacSplitData) {
std::string pcm = aacToPcm(aac);
if (!pcm.empty()) {
_pcmQueue.push(std::move(pcm));
}
}
return 0;
}
std::string AacToPcm::outputPcmData() {
if (_pcmQueue.empty()) {
return "";
}
std::string pcm = _pcmQueue.front();
_pcmQueue.pop();
return pcm;
}
int AacToPcm::splitAacData() {
_aacSplitData.clear();
std::string::size_type rpos = _aacData.find(0xff, 0);
_lastAacPos = -1;
while ( rpos!=std::string::npos) {
// adts header 长度7
if (rpos + 7 > _aacData.size()) {
_lastAacPos = rpos;
break;
}
// adts header 中 syncword:帧同步标识一个帧的开始,固定为0xFFF
// layer:固定为'00'
uint8_t code = _aacData[rpos+1];
if (((code & (uint8_t)0xf0) != 0xf0) || (code & (uint8_t)0x06) != 0) {
rpos = _aacData.find(0xff, rpos+1);
continue;
}
uint8_t i = _aacData[rpos+3];
uint8_t j = _aacData[rpos+4];
uint8_t k = _aacData[rpos+5];
uint32_t aacFrameLen = ((i & 0x3) << 9) + (j << 3) + ((k & 0xe0) >> 5);
if (rpos + aacFrameLen > _aacData.size()) {
_lastAacPos = rpos;
break;
} else {
_aacSplitData.push_back(_aacData.substr(rpos, aacFrameLen));
//最后一个分段正好是一个完整的aac帧
if (rpos + aacFrameLen == _aacData.size()) {
_lastAacPos = -1;
_aacData.clear();
break;
}
}
rpos = _aacData.find(0xff, rpos+aacFrameLen);
}
return _aacSplitData.empty() ? -1 :0;
}
aacToPcm.h:
#ifndef AACTOPCM_H
#define AACTOPCM_H
#include <queue>
#include <string>
#include "neaacdec.h"
class AacToPcm {
public:
explicit AacToPcm(uint64_t maxParseSize);
~AacToPcm();
int inputAacData(uint8_t *data, uint32_t len);
std::string outputPcmData();
int init();
protected:
int splitAacData();
std::string aacToPcm(const std::string &aacData);
private:
uint64_t _maxParseSize;
std::vector<std::string> _aacSplitData;
std::queue<std::string> _pcmQueue;
int _lastAacPos;
std::string _aacData;
bool _aac2pcmInitFlag;
NeAACDecHandle _hDecoder;
};
#endif
main.cpp:
#include <memory>
#include <cstring>
#include "aacToPcm.h"
int main() {
//打开2个文件,我们目标是读取aac文件并转码,最后保存为pcm文件
FILE *fpIn = fopen("D:\\music.aac", "rb");
if (!fpIn) {
return -1;
}
FILE *fpOut = fopen("D:\\music.pcm", "wb");
if (!fpOut) {
return -1;
}
//构造类对象AacToPcm并初始化, 最大解析大小可以自己定义
std::shared_ptr<AacToPcm> aac2Pcm = std::make_shared<AacToPcm>(100*1024);
if(aac2Pcm->init()) {
return -1;
}
int ret = 0;
uint8_t aacBuf[1024] = {0};
while(true) {
//循环读取aac数据
memset(aacBuf, 0, sizeof(aacBuf));
ret = fread(aacBuf, 1, sizeof(aacBuf), fpIn);
if (ret <= 0) {
break;
}
//输入aac数据
if (aac2Pcm->inputAacData(aacBuf, ret)) {
continue;
}
//循环从缓存中输出pcm数据并保存为文件
while (true) {
std::string pcm = aac2Pcm->outputPcmData();
if (pcm.empty()) {
break;
}
fwrite(pcm.data(), 1, pcm.size(), fpOut);
}
}
return 0;
}
5.代码下载
1. 功能:使用faad2库解码aac音频转为pcm文件,输入文件和输出文件在main函数中修改文件名即可。
2. 支持vs2017 & linux环境下gcc编译(linuxx使用的gcc版本为4.8.5,系统为centos7.8)。
3. faad2库使用的2_9_1版本,源码放在depend目录下,打开depends\faad\faad2.sln可直接使用vs2017编译库。
4. lib目录包含了vs2017编译的库以及linux环境下(4.8.5gcc)编译的静态库、动态库。
5. 使用的Cmake文件,请安装3.14及以上的cmake程序,或者修改CMakeLists.txt中Cmake的版本号,改为比你的cmake版本低即可。
6. 下载地址: https://download.csdn.net/download/qq_41824928/14122879