基于 FFMPEG 的音频编解码（二）：音频解码

最新推荐文章于 2024-08-20 12:01:35 发布

芥末的无奈

最新推荐文章于 2024-08-20 12:01:35 发布

阅读量776

点赞数

分类专栏：音频处理文章标签： ffmpeg 音频编码解码

本文链接：https://blog.csdn.net/weiwei9363/article/details/108565695

版权

音频处理专栏收录该内容

28 篇文章 45 订阅

订阅专栏

音频解码

基于 FFMPEG 的音频编解码（一）：Hello FFMPEG，安装与编译
基于 FFMPEG 的音频编解码（二）：音频解码
基于 FFMPEG 的音频编解码（三）：音频编码

在 Hello FFMPEG 我们已经知道如何安装 FFMPEG，并通过一个示例，演示了如何利在 cmake 在构建一个 FFMPEG 程序。

今天，我们将进入主题，来聊聊利用 FFMPEG 对音频进行解码。

基本知识

首先介绍一些关于音频的重要概念，这些概念在后面的编解码中非常重要。

Interleave VS Planar

假设有一个 2 声道的音频，在代码中，我们可能有两种形式来存放这些音频数据。

第一种，交织（Interleave）排放：

LRLRLRLRLRLRLRLRLRLR

第二种，平面（Planar）排放：

LLLLLLLLLL
RRRRRRRRRR

Sample Format 采样格式

采样格式说明了音频采样的类型、以及如何排列。在 FFMPEG 中，AVSampleFormat 枚举了所有采样格式，包括：

Enumerator	说明
AV_SAMPLE_FMT_NONE
AV_SAMPLE_FMT_U8	unsigned 8 bits
AV_SAMPLE_FMT_S16	signed 16 bits
AV_SAMPLE_FMT_S32	signed 32 bits
AV_SAMPLE_FMT_FLT	float
AV_SAMPLE_FMT_DBL	double
AV_SAMPLE_FMT_U8P	unsigned 8 bits, planar
AV_SAMPLE_FMT_S16P	signed 16 bits, planar
AV_SAMPLE_FMT_S32P	signed 32 bits, planar
AV_SAMPLE_FMT_FLTP	float, planar
AV_SAMPLE_FMT_DBLP	double, planar

可以看到如何尾巴不带 ‘P’ 的都是交织排列的，带 ‘P’ 都是平面排列的。以及不同的比特深度，8、16、32 位。

另外，还有浮点值或者整形值，其中浮点值的范围在 [-1, 1] 之间。

Show me the code

废话不多说，直接上代码，具体解释放在后面。


#if defined(__cplusplus)
extern "C"
{
#endif

#include <libavformat/avformat.h>
#include <libavcodec/avcodec.h>
#include <libswresample/swresample.h>
#include <libavutil/opt.h>

#if defined(__cplusplus)
}
#endif

#include <iostream>
#include <fstream>
#include <vector>
using namespace std;

int main(int argc, char* argv[])
{
    const string path = argv[1];

    AVFormatContext *pFormatContext = avformat_alloc_context();
    if(avformat_open_input(&pFormatContext, path.c_str(), NULL, NULL) < 0){
        cerr << "open file failed\n";
        return -1;
    }

    avformat_find_stream_info(pFormatContext, NULL);

    // find the audio stream info
    int audio_stream_index = -1;
    AVCodec *pCodec = NULL;
    AVCodecParameters *pCodecParameters = NULL;
    for(int i = 0; i < pFormatContext->nb_streams; ++i){
        if(pFormatContext->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_AUDIO){
            audio_stream_index = i;
            pCodecParameters = pFormatContext->streams[i]->codecpar;
            pCodec = avcodec_find_decoder(pCodecParameters->codec_id);
        }
    }

    // create codec context
    AVCodecContext *pCodecContext = avcodec_alloc_context3(pCodec);
    avcodec_parameters_to_context(pCodecContext, pCodecParameters);
    avcodec_open2(pCodecContext, pCodec, NULL);

    AVPacket *pPacket = av_packet_alloc();
    AVFrame *pFrame = av_frame_alloc();

    const int output_channel = 2;

    struct SwrContext* swr = swr_alloc();
    av_opt_set_int(swr, "in_channel_count",  pCodecContext->channels, 0);
    av_opt_set_int(swr, "out_channel_count", output_channel, 0);
    av_opt_set_int(swr, "in_channel_layout", pCodecContext->channel_layout, 0);
    av_opt_set_int(swr, "out_channel_layout", AV_CH_LAYOUT_STEREO, 0);
    av_opt_set_int(swr, "in_sample_rate", pCodecContext->sample_rate, 0);
    av_opt_set_int(swr, "out_sample_rate", pCodecContext->sample_rate, 0);
    av_opt_set_sample_fmt(swr, "in_sample_fmt", pCodecContext->sample_fmt, 0);
    av_opt_set_sample_fmt(swr, "out_sample_fmt", AV_SAMPLE_FMT_FLTP, 0);
    swr_init(swr);

    uint8_t* internal_buffer[output_channel];

    std::fstream out("outfile.pcm", std::ios::out | std::ios::binary);

    for(;av_read_frame(pFormatContext, pPacket) >= 0;){
        if(pPacket->stream_index == audio_stream_index){
            // decode audio packet
            int ret = avcodec_send_packet(pCodecContext, pPacket);

            for(;ret >= 0;){
                ret = avcodec_receive_frame(pCodecContext, pFrame);

                if(ret == AVERROR(EAGAIN) || ret == AVERROR_EOF) break;
                else if(ret < 0) {
                    cerr << "error while receiving a frame from decoder" << endl;
                    break;
                }

                av_samples_alloc(internal_buffer, NULL, output_channel, pFrame->nb_samples, AV_SAMPLE_FMT_FLTP, 0);
                swr_convert(swr, internal_buffer, pFrame->nb_samples, (const uint8_t**) pFrame->data, pFrame->nb_samples);
                out.write((char*)(internal_buffer[0]), sizeof(float) * pFrame->nb_samples);

                av_freep(&internal_buffer[0]);
            }

            av_packet_unref(pPacket);
        }
    }


    swr_free(&swr);
    av_packet_free(&pPacket);
    av_frame_free(&pFrame);
    avcodec_free_context(&pCodecContext);
    avformat_close_input(&pFormatContext);

    return 0;

}

我们首先打开文件

AVFormatContext *pFormatContext = avformat_alloc_context();
avformat_open_input(&pFormatContext, path.c_str(), NULL, NULL) < 0);

接着找到音频流的信息

avformat_find_stream_info(pFormatContext, NULL);
for(int i = 0; i < pFormatContext->nb_streams; ++i){
    if(pFormatContext->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_AUDIO){
        audio_stream_index = i;
        pCodecParameters = pFormatContext->streams[i]->codecpar;
        pCodec = avcodec_find_decoder(pCodecParameters->codec_id);
    }
}

然后打开音频解码器

AVCodecContext *pCodecContext = avcodec_alloc_context3(pCodec);
avcodec_parameters_to_context(pCodecContext, pCodecParameters);
avcodec_open2(pCodecContext, pCodec, NULL);

随后初始化 SwrContex 用于重采样，方便我们输出想要的 sample format、采样率、声道等
从代码中可以知道，我们想要 AV_SAMPLE_FMT_FLTP 格式，也就是浮点+平面的格式，这样比较方便音频算法进行处理。同时输出 2 声道的音频。

const int output_channel = 2;
struct SwrContext* swr = swr_alloc();
av_opt_set_int(swr, "in_channel_count",  pCodecContext->channels, 0);
av_opt_set_int(swr, "out_channel_count", output_channel, 0);
av_opt_set_int(swr, "in_channel_layout", pCodecContext->channel_layout, 0);
av_opt_set_int(swr, "out_channel_layout", AV_CH_LAYOUT_STEREO, 0);
av_opt_set_int(swr, "in_sample_rate", pCodecContext->sample_rate, 0);
av_opt_set_int(swr, "out_sample_rate", pCodecContext->sample_rate, 0);
av_opt_set_sample_fmt(swr, "in_sample_fmt", pCodecContext->sample_fmt, 0);
av_opt_set_sample_fmt(swr, "out_sample_fmt", AV_SAMPLE_FMT_FLTP, 0);
swr_init(swr);

最重要的一步，进行音频解码。

首先 av_read_frame 读取一个 packet（未解码）
接着用 avcodec_send_packet 和 avcodec_receive_frame 将 packet 解码为 frame
再接 swr_convert 对 frame 进行重采样处理，转换为我们想要的格式
最后，out.write 将 左声道 的 PCM 写入文件

av_read_frame(pFormatContext, pPacket);
int ret = avcodec_send_packet(pCodecContext, pPacket);
ret = avcodec_receive_frame(pCodecContext, pFrame);

swr_convert(swr, internal_buffer, pFrame->nb_samples, (const uint8_t**) pFrame->data, pFrame->nb_samples);
out.write((char*)(internal_buffer[0]), sizeof(float) * pFrame->nb_samples);

以上就是解码音频的所有步骤，其中最值得说道的是音频解码，里面有一个细节，那就是：解码后的数据如何排列

排列的顺序和 sample format 有关，输入文件的 sample format 为 AV_SAMPLE_FMT_FLTP，即：

pCodecContext->sample_fmt == AV_SAMPLE_FMT_FLTP

那么解码后数据以平面方式存放， pFrame->data[0] 为存放第一个声道数据，pFrame->data[1] 存放第二个声道数据，以此类推。并且每个采样的大小应该是 sizeof(float)。

如果是 AV_SAMPLE_FMT_FLT 呢？即：

pCodecContext->sample_fmt == AV_SAMPLE_FMT_FLTP

解码后数据以交织方式存放，pFrame->data[0] 存放存放所有数据，并且每个采样的大小应该是 sizeof(float)。

如果是 AV_SAMPLE_FMT_S16 呢？即：

pCodecContext->sample_fmt == AV_SAMPLE_FMT_S16

解码后数据以交织方式存放，pFrame->data[0] 存放存放所有数据，并且每个采样的大小应该是 sizeof(int16_t)。

经过swr_convert重采样输出的格式同理。

完整代码已上传至 github，欢迎 star：
https://github.com/jiemojiemo/ffmepg_audio_tutorial

转载请注明出处：https://blog.csdn.net/weiwei9363/article/details/108565695

参考资料

芥末的无奈

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录