faac和faad的使用
本文中faac和faad的使用参考雷神的SpecialAAAC.exe源码,指路([雷霄骅](雷霄骅
)强烈推荐雷神博客,免费而且全,就是有些函数有点老了QAQ
背景简介
需要将aac转为pcm,以及将pcm转为aac。觉得引入ffmpeg库有点太大了,决定使用faac以及faad库进行解析
windows下faac库以及faad2库编译
- 源码下载,我之前是在gitcode把github库全复制过来的时候下的,现在好像没有了,源码可以去github上下载faad2这个人的库里有faac和faad
- faac编译比较简单,因为本身是有sln文件的,在解压后的project/msvc路径下,打开faac.sln转为VS2019或者其他版本,直接编译就行,我在编译时没有遇到报错。编译完成后在bin文件夹下就能得到faac库了
- faad库则没有VS的工程文件,是通过CMake进行编译的。VS2022是可以直接在安装时添加CMake支持的。如果没有,也可以在windows下个CMake命令,通过cmd命令生成VS工程文件。这个编译时我也没遇到报错,最后的输出结果在out/路径下
注:如果实在没法在windows下编出结果的,雷神的项目文件里有一个faacfaad库,只不过好像封装了一下,得多引用一点东西。我试了下基础接口没什么区别,可以用,好使
faac库的使用
faac是编码库,先介绍一下faac库的几个重要函数和结构体,其他函数和结构体可以在faac源码文件夹docs下通过阅读文档看下具体用法
faacEncOpen
faacEncHandle FAACAPI faacEncOpen(unsigned long sampleRate,
unsigned int numChannels,
unsigned long *inputSamples,
unsigned long *maxOutputBytes
);
作用:打开编码handle,faacEncHandle就是解码器的一个句柄
参数解析:采样率、声道数是传入的参数
后面的指针是这个faacEncOpen返回的,不用赋值,传int类型地址就行,这两个参数告诉你,编码时每次输入的音频采样数量以及编码为aac后的最大数据长度
faacEncGetCurrentConfiguration
faacEncConfigurationPtr FAACAPI
faacEncGetCurrentConfiguration(faacEncHandle hEncoder);
作用:获取传入的编码器的配置结构体指针–faacEncConfigurationPtr
解析:faac在进行编码器配置之前,要获取编码配置的指针,再进行配置
faacEncConfigurationPtr
typedef struct faacEncConfiguration
{
/* config version */
int version;
/* library version */
char *name;
/* copyright string */
char *copyright;
/* MPEG version, 2 or 4 */
unsigned int mpegVersion;
/* AAC object type */
unsigned int aacObjectType;
union {
/* Joint coding mode */
unsigned int jointmode;
/* compatibility alias */
unsigned int allowMidside;
};
/* Use one of the channels as LFE channel */
unsigned int useLfe;
/* Use Temporal Noise Shaping */
unsigned int useTns;
/* bitrate / channel of AAC file */
unsigned long bitRate;
/* AAC file frequency bandwidth */
unsigned int bandWidth;
/* Quantizer quality */
unsigned long quantqual;
/* Bitstream output format (0 = Raw; 1 = ADTS) */
unsigned int outputFormat;
/* psychoacoustic model list */
psymodellist_t *psymodellist;
/* selected index in psymodellist */
unsigned int psymodelidx;
/*
PCM Sample Input Format
0 FAAC_INPUT_NULL invalid, signifies a misconfigured config
1 FAAC_INPUT_16BIT native endian 16bit
2 FAAC_INPUT_24BIT native endian 24bit in 24 bits (not implemented)
3 FAAC_INPUT_32BIT native endian 24bit in 32 bits (DEFAULT)
4 FAAC_INPUT_FLOAT 32bit floating point
*/
unsigned int inputFormat;
/* block type enforcing (SHORTCTL_NORMAL/SHORTCTL_NOSHORT/SHORTCTL_NOLONG) */
int shortctl;
/*
Channel Remapping
Default 0, 1, 2, 3 ... 63 (64 is MAX_CHANNELS in coder.h)
WAVE 4.0 2, 0, 1, 3
WAVE 5.0 2, 0, 1, 3, 4
WAVE 5.1 2, 0, 1, 4, 5, 3
AIFF 5.1 2, 0, 3, 1, 4, 5
*/
int channel_map[64];
int pnslevel;
} faacEncConfiguration, *faacEncConfigurationPtr;
在头文件中,这个配置结构体的注释还是比较清晰的。我们主要会设置的参数是inputFormat、outputFormat、aacObjectType、bitRate
inputFormat:用于配置输入的pcm数据的位深,一般是16bit
outputForamt:分为两种,带adts和不带adts,这个adts就是aac的信息是同一放在头,还是每一帧的音频数据包都带有头,这样adts格式的aac就可以从中间开始播放和解析,而ADIF的只能在全部获取的情况下才能解析
bitRate:自动设置为64kpbs,即每个通道的码率为64kpbs
faacEncSetConfiguration
int FAACAPI faacEncSetConfiguration(faacEncHandle hEncoder,
faacEncConfigurationPtr config);
作用:设置编码参数,在设置完配置结构体后,再调用用一次这个函数。不要觉得faacEncGetCurrentConfiguration返回的是指针就不调用这个函数了
参数:hEncoder–faacEncOpen申请的那个。
config–刚刚配置完的指针
faacEncEncode
int FAACAPI faacEncEncode(faacEncHandle hEncoder,
int32_t * inputBuffer,
unsigned int samplesInput,
unsigned char *outputBuffer,
unsigned int bufferSize);
作用:编码pcm为aac
参数:hEncoder–编码器句柄
inputBuffer–pcm数据指针
samplesInput–音频采样数量,这个是faacEncOpen函数返回的那个
outputBuffer–编码后数据输出位置
bufferSize–编码后aac的最大长度
faac编码流程
- 调用faacEncOpen打开编码器句柄,传入采样率、声道数,获得输入采样数inputSamples、编码后aac文件大小maxOutputBytes
- 根据maxOutputBytes初始化解码缓存空间。我直接给了一个足够大的空间,用着也可以
- faacEncGetCurrentConfiguration获取配置指针,补全配置,faacEncSetConfiguration设置参数
- 调用faacEncEncode进行音频编码
- faacEncClose关闭句柄
封装后的faac编码器
解码器.h文件
class AacFormatConversion
{
public:
explicit AacFormatConversion(uint32_t maxInputSize = 1024 * 1024, uint32_t maxAacQueueSize = 1024);
~AacFormatConversion();
int InputPcmData(const uint8_t *voipData, uint32_t datasize);
std::string GetAacFrame();
int Init(uint8_t Channel, uint32_t SampleRate, uint32_t BitSize);
private:
faacEncHandle hEncoder;
faacEncConfigurationPtr pConfiguration;
unsigned long m_nInputSamples;
unsigned long m_nMaxOutputBytes;
unsigned long m_nMaxInputBytes;
const uint32_t m_AacMaxQueueSize;
const uint32_t m_inputMaxSize;
std::string m_InputStr;
std::queue<std::string> m_AacQueue;
unsigned char *m_AacBuffer;
};
解码器.cpp文件
AacFormatConversion::AacFormatConversion(uint32_t maxInputSize, uint32_t maxAacQueueSize)
:hEncoder(nullptr)
, pConfiguration(nullptr)
, m_AacBuffer(nullptr)
, m_nInputSamples(0)
, m_nMaxOutputBytes(0)
, m_nMaxInputBytes(0)
, m_AacMaxQueueSize(maxAacQueueSize)
, m_inputMaxSize(maxInputSize)
{
}
AacFormatConversion::~AacFormatConversion()
{
if (hEncoder != nullptr) {
faacEncClose(hEncoder);
}
delete[]m_AacBuffer;
}
int AacFormatConversion::Init(uint8_t Channel, uint32_t SampleRate, uint32_t BitSize)
{
hEncoder = faacEncOpen(SampleRate, Channel, &m_nInputSamples, &m_nMaxOutputBytes);
if (hEncoder == nullptr) {
std::cerr<<"faac enc open faild";
return -1;
}
m_nMaxInputBytes = m_nInputSamples * BitSize / 8;
m_AacBuffer = new unsigned char[m_nMaxOutputBytes];
pConfiguration =faacEncGetCurrentConfiguration(hEncoder);
pConfiguration->inputFormat = FAAC_INPUT_16BIT;
pConfiguration->outputFormat = ADTS_STREAM;
pConfiguration->aacObjectType = LOW;
pConfiguration->bitRate = 64000;
faacEncSetConfiguration(hEncoder, pConfiguration);
return 0;
}
int AacFormatConversion::InputPcmData(const uint8_t *pcmData, uint32_t datasize)
{
if (datasize + m_InputStr.size() > m_inputMaxSize) {
std::cerr<<"Input Data is too large";
return -1;
}
m_InputStr.append((const char *)pcmData, datasize);
if (m_AacQueue.size() > m_AacMaxQueueSize) {
std::cerr<<"Aac queue is full";
return -1;
}
int ret = 0;
while (m_InputStr.size() >= m_nMaxInputBytes) {
memset(m_AacBuffer, 0, m_nMaxOutputBytes);
ret = faacEncEncode(hEncoder, (int32_t *)m_InputStr.data(), m_nInputSamples, m_AacBuffer, m_nMaxOutputBytes);
if (ret > 0) {
m_AacQueue.emplace((const char *)m_AacBuffer, ret);
}
m_InputStr.erase(0, m_nMaxInputBytes);
}
return 0;
}
std::string AacFormatConversion::GetAacFrame()
{
if (m_AacQueue.empty()) {
return "";
}
std::string aac = m_AacQueue.front();
m_AacQueue.pop();
return aac;
}
封装之后傻瓜式操作,如果有需要可以改下init函数,将编码器参数全整成可配置的,这里有的写死了
faad库的使用
faad是编码库,先介绍一下faad库的几个重要函数和结构体,其他函数和结构体可以在faad源码文件夹docs下通过阅读文档看下具体用法
NeAACDecOpen
NeAACDecHandle NEAACDECAPI NeAACDecOpen(void);
作用:打开解码handle,NeAACDecHandle就是解码器的一个句柄,这个不用传入任何参数
NeAACDecGetCurrentConfiguration
NeAACDecConfigurationPtr NEAACDECAPI NeAACDecGetCurrentConfiguration(NeAACDecHandle hDecoder);
作用:与faac一脉相承,返回一个配置结构体指针
参数:传入NeAACDecOpen返回的句柄
NeAACDecConfigurationPtr
typedef struct NeAACDecConfiguration
{
unsigned char defObjectType;
unsigned long defSampleRate;
unsigned char outputFormat;
unsigned char downMatrix;
unsigned char useOldADTSFormat;
unsigned char dontUpSampleImplicitSBR;
} NeAACDecConfiguration, *NeAACDecConfigurationPtr;
结构体参数解析:
defObjectType:这个有宏定义
/* object types for AAC */
#define MAIN 1
#define LC 2
#define SSR 3
#define LTP 4
#define HE_AAC 5
#define ER_LC 17
#define ER_LTP 19
#define LD 23
#define DRM_ER_LC 27 /* special object type for DRM */
defSampleRate:默认采样率,如常见的CD品质44100Hz或高分辨率的96000Hz。
outputFormat:依然有宏定义如下
#define FAAD_FMT_16BIT 1
#define FAAD_FMT_24BIT 2
#define FAAD_FMT_32BIT 3
#define FAAD_FMT_FLOAT 4
#define FAAD_FMT_FIXED FAAD_FMT_FLOAT
#define FAAD_FMT_DOUBLE 5
downMatrix:我是默认配置为0了
dontUpSampleImplicitSBR:一个标志位,表示是否禁用内嵌的SBR(Surround Sound Boost)的上采样
NeAACDecSetConfiguration
unsigned char NEAACDECAPI NeAACDecSetConfiguration(NeAACDecHandle hDecoder, NeAACDecConfigurationPtr config);
作用:设置解码参数,传入解码器句柄和配置指针
NeAACDecInit
long NEAACDECAPI NeAACDecInit(NeAACDecHandle hDecoder,
unsigned char *buffer,
unsigned long buffer_size,
unsigned long *samplerate,
unsigned char *channels);
作用:初始化解码器
参数:hDecoder–解码器句柄
buffer–传入的aac数据
buffer_size–aac数据长度
samplerate–这个是传入之后返回的
nChannel–这个也是传入之后返回的
失败返回-1
NeAACDecDecode
void* NEAACDECAPI NeAACDecDecode(NeAACDecHandle hDecoder,
NeAACDecFrameInfo *hInfo,
unsigned char *buffer,
unsigned long buffer_size);
作用:解码函数
参数:hDecoder–解码器句柄
hInfo–初始化一个传进去,最后根据这里的信息获取pcm数据
buffer&buffer_size–和NeAACDecInit的一样
faad解码流程
- 调用NeAACDecOpen打开编码器句柄
- NeAACDecGetCurrentConfiguration获取配置指针,补全配置,NeAACDecSetConfiguration设置参数
- 调用NeAACDecInit进行解码前的初始化
- NeAACDecDecode进行解码
- 根据 NeAACDecDecode进行解码的NeAACDecFrameInfo信息对解码结果进行解析
- NeAACDecClose关闭解码器
封装后的faad解码器
.h文件
class VoiceAacConvertPcm
{
public:
explicit VoiceAacConvertPcm(uint64_t maxParseSize = 1024 * 1024);
~VoiceAacConvertPcm();
int inputAacData(uint8_t *data, uint32_t len);
std::string outputPcmData();
int init();
private:
int splitAacData();
std::string aacToPcm(const std::string &aacData);
std::string convertToMono(const std::string &pcmData);
private:
uint64_t _maxParseSize;
std::vector<std::string> _aacSplitData;
std::queue<std::string> _pcmQueue;
int _lastAacPos;
std::string _aacData;
bool _aac2pcmInitFlag;
NeAACDecHandle _hDecoder;
NeAACDecFrameInfo hInfo;
};
解码器.cpp
VoiceAacConvertPcm::VoiceAacConvertPcm(uint64_t maxParseSize)
: _maxParseSize(maxParseSize)
, _lastAacPos(0)
, _aac2pcmInitFlag(false)
, _hDecoder(nullptr)
{
}
VoiceAacConvertPcm::~VoiceAacConvertPcm()
{
if (_hDecoder) {
NeAACDecClose(_hDecoder);
_hDecoder = nullptr;
}
}
int VoiceAacConvertPcm::init() {
_hDecoder = NeAACDecOpen();
if (!_hDecoder) {
return -1;
}
NeAACDecConfigurationPtr config = NeAACDecGetCurrentConfiguration(_hDecoder);
config->defSampleRate = 21000;
config->defObjectType = LC;
config->outputFormat = FAAD_FMT_16BIT;
config->downMatrix = 0;
config->dontUpSampleImplicitSBR = 1;
NeAACDecSetConfiguration(_hDecoder, config);
return 0;
}
std::string VoiceAacConvertPcm::aacToPcm(const std::string &aacData)
{
static char frame_mono[2048] = { 0 };
if (!_aac2pcmInitFlag) {
unsigned long lSampleRate = 0;
uint8_t nChannel = 0;
if (-1 == NeAACDecInit(_hDecoder, (uint8_t *)aacData.data(), aacData.size(), &lSampleRate, &nChannel)) {
return "";
}
_aac2pcmInitFlag = true;
}
char *pData = nullptr;
pData = (char *)NeAACDecDecode(_hDecoder, &hInfo, (uint8_t *)aacData.data(), aacData.size());
if (hInfo.error > 0) {
return "";
}
else if (pData) {
short *sample_buffer16 = (short *)pData;
char *data = (char *)malloc(hInfo.samples * 16 * sizeof(char) / 8);
for (int i = 0; i < hInfo.samples; i++)
{
data[i * 2] = (char)(sample_buffer16[i] & 0xFF);
data[i * 2 + 1] = (char)((sample_buffer16[i] >> 8) & 0xFF);
}
std::string realres = convertToMono(std::string(data, hInfo.samples * 2));
if (data) free(data);
return realres;
}
}
int VoiceAacConvertPcm::inputAacData(uint8_t *data, uint32_t len) {
if (_lastAacPos > 0 && _lastAacPos < _aacData.size()) {
_aacData.erase(0, _lastAacPos);
}
_lastAacPos = 0;
if (len + _aacData.size() < _maxParseSize) {
_aacData.append((const char *)data, len);
}
else if (len > _maxParseSize) {
fprintf(stderr, "data Size(%d) is too large to contain\n", len);
}
else {
_aacData = std::string((const char *)data, len);
}
if (splitAacData()) {
return -1;
}
for (const auto &aac : _aacSplitData) {
std::string pcm = aacToPcm(aac);
if (!pcm.empty()) {
_pcmQueue.push(std::move(pcm));
}
}
return 0;
}
std::string VoiceAacConvertPcm::outputPcmData() {
if (_pcmQueue.empty()) {
return "";
}
std::string pcm = _pcmQueue.front();
_pcmQueue.pop();
return pcm;
}
std::string VoiceAacConvertPcm::convertToMono(const std::string &pcmData) {
std::string monoData;
for (size_t i = 0; i < pcmData.size(); i += 4) {
monoData.push_back(pcmData[i]);
monoData.push_back(pcmData[i + 1]);
}
return monoData;
}
int VoiceAacConvertPcm::splitAacData() {
_aacSplitData.clear();
std::string::size_type rpos = _aacData.find(0xff, 0);
_lastAacPos = -1;
while (rpos != std::string::npos) {
if (rpos + 7 > _aacData.size()) {
_lastAacPos = rpos;
break;
}
uint8_t code = _aacData[rpos + 1];
if (((code & (uint8_t)0xf0) != 0xf0) || (code & (uint8_t)0x06) != 0) {
rpos = _aacData.find(0xff, rpos + 1);
continue;
}
uint8_t i = _aacData[rpos + 3];
uint8_t j = _aacData[rpos + 4];
uint8_t k = _aacData[rpos + 5];
uint32_t aacFrameLen = ((i & 0x3) << 9) + (j << 3) + ((k & 0xe0) >> 5);
if (rpos + aacFrameLen > _aacData.size()) {
_lastAacPos = rpos;
break;
}
else {
_aacSplitData.push_back(_aacData.substr(rpos, aacFrameLen));
if (rpos + aacFrameLen == _aacData.size()) {
_lastAacPos = -1;
_aacData.clear();
break;
}
}
rpos = _aacData.find(0xff, rpos + aacFrameLen);
}
return _aacSplitData.empty() ? -1 : 0;
}
注意:解码的时候aac要按照adts进行切分
其中遇到了一些问题,就是我对单声道aac进行解码后,给我出双声道的pcm,导致后来的音频播放出问题。这个我参考了其他的文章,有两种方法,试了都不好使
一种是配置结构体中的dontUpSampleImplicitSBR设置为1
一种是cmeke的时候加配置
不过我还看到了一个改源码的,这个没试过文章链接
我最后的方法是写了一个函数,直接转单声道,方法有点low,但是还挺好使。
我怀疑是编译的时候确实什么地方错了,我这个库是在windows下编译和使用的,Windows下Cmake用的不熟练,就先凑合用了,什么时候有时间调一下,看看能不能找到是什么原因