ffmpeg学习函数分析swr_convert

最新推荐文章于 2022-10-26 18:21:20 发布

aworkholic

最新推荐文章于 2022-10-26 18:21:20 发布

阅读量4.8k

点赞数 5

分类专栏：音视频编解码文章标签： ffmpeg libswreample 音视频编解码 swr_convert

本文链接：https://blog.csdn.net/wanggao_1990/article/details/115731502

版权

音视频编解码专栏收录该内容

38 篇文章 90 订阅

订阅专栏

文章目录

有关ffmpeg中主要的api函数源码解析参考雷神系列文章，整理如下 ffmpeg学习（2）获取和使用，源码分析。

libswresample主要是用于音频的重采样和格式转换的,包含如下功能：

采样频率转换：对音频的采样频率进行转换的处理，例如把音频从一个高的44100Hz的采样频率转换到8000Hz；从高采样频率到低采样频率的音频转换是一个有损的过程

声道格式转换：对音频的声道格式进行转换的处理，例如立体声转换为单声道；当输入通道不能映射到输出流时，这个过程是有损的，因为它涉及不同的增益因素和混合。

采样格式转换：对音频的样本格式进行转换的处理，例如把s16（AV_SAMPLE_FMT_S16）的PCM数据转换为s8格式或者f32的PCM数据；此外提供了Packed和Planar包装格式之间相互转换的功能。

简单说明

有关PCM介绍查看文章 ffmpeg学习音频采样数据PCM，采样格式、声道格式可以简单的手工处理，详见文章 ffmpeg学习（6）音频解码、音频数据处理，这里再简单说明如下。

采样格式转换

采样数据从32位float类型数据转换位无符号8位uchar类型，需要将取值范围转换到[0,255]。

for(int n = 0; n < frame->nb_samples; n++)
    for(int c = 0; c < frame->channels; c++) {
        float vsrc = *(float *)(frame->data[c] + n*in_sample_bytes);
        unsigned char vdst = (vsrc*128 + 128);
        fwrite(&vdst, sizeof(unsigned char), 1, fpcm);
    }

采样数据从从32位float类型数据转换位16位short类型，需要将取值范围转换到[-32768~32767]。

for(int n = 0; n < frame->nb_samples; n++)
    for(int c = 0; c < frame->channels; c++) {
        float vsrc = *(float *)(frame->data[c] + n*in_sample_bytes);
        short vdst = vsrc*32768;
        fwrite(&vdst, sizeof(short), 1, fpcm);
    }

声道格式转换

通道从少变多，可以复制一个通道数据。从多变少，可以直接保留需要的声道。
从原来的2个通道，保存为1个通道，可以选择保存一个或者去平均；

for(int n = 0; n < frame->nb_samples; n++) {
    float vdst = 0;
    for(int c = 0; c < frame->channels; c++) 
        vdst += *(float *)(frame->data[c] + n*in_sample_bytes);
    vdst /= frame->channels;
    fwrite(&vdst, sizeof(float), 1, fpcm);
}

采样频率转换

这里仅给出，转换前频率是转换后频率的整数倍，例如转换前后频率分别为48000和8000。我们将输入的采样数据每间隔6个保存一个即可。例如

for(int n = 0; n < frame->nb_samples; n+=6)
    for(int c = 0; c < frame->channels; c++) {
        float vsrc = *(float *)(frame->data[c] + n*in_sample_bytes);
        char vdst = vsrc*128;
        fwrite(&vdst, sizeof(char), 1, fpcm);
    }
}

libswresample库使用

当音频的采样率与播放器的采样率不一致时，那么想在播放器正常播放，就需要对音频进行重采样，否则可能会出现音频变速的问题（两个采样频率不能整除，手动处理需要插值补齐等）。这里着重介绍使用libswresample库处理音频采样数据的转换。

使用流程
（1）实例化SwrContext对象
（2）调用 swr_convert() 进行采样数据转换
（3）释放SwrContext对象

类似SwsContext使用，初次实例化SwrContext对象也有两种方法:

第一种，先调用SwrContext *swr = swr_alloc_set_opts(…)函数，再调用swr_init(swr);

第二种，先调用SwrContext *swr = swr_alloc();，再调用av_opt_set_xxxx()分别设置各参数，最后调用swr_init(swr);。

通常，我们首次初始化SwrContext对象使用第一种方式，之后如有需要修改参数，可以继续调用av_opt_set_xxxx()，并执行swr_init(swr);

函数介绍

初始化、配置SwrContext对象

/**
 * Allocate SwrContext if needed and set/reset common parameters.
 *
 * This function does not require s to be allocated with swr_alloc(). On the
 * other hand, swr_alloc() can use swr_alloc_set_opts() to set the parameters
 * on the allocated context.
 *
 * @param s               existing Swr context if available, or NULL if not
 * @param out_ch_layout   output channel layout (AV_CH_LAYOUT_*)
 * @param out_sample_fmt  output sample format (AV_SAMPLE_FMT_*).
 * @param out_sample_rate output sample rate (frequency in Hz)
 * @param in_ch_layout    input channel layout (AV_CH_LAYOUT_*)
 * @param in_sample_fmt   input sample format (AV_SAMPLE_FMT_*).
 * @param in_sample_rate  input sample rate (frequency in Hz)
 * @param log_offset      logging level offset
 * @param log_ctx         parent logging context, can be NULL
 *
 * @see swr_init(), swr_free()
 * @return NULL on error, allocated context otherwise
 */
struct SwrContext *swr_alloc_set_opts(struct SwrContext *s,
                                      int64_t out_ch_layout, enum AVSampleFormat out_sample_fmt, int out_sample_rate,
                                      int64_t  in_ch_layout, enum AVSampleFormat  in_sample_fmt, int  in_sample_rate,
                                      int log_offset, void *log_ctx);

采样数据转换，参数为输入、输出采样数据指针及采样数量，返回值为转换得到采样数据个数。当输入为空，表示flush其内部缓冲数据。

/** Convert audio.
 *
 * in and in_count can be set to 0 to flush the last few samples out at the
 * end.
 *
 * If more input is provided than output space, then the input will be buffered.
 * You can avoid this buffering by using swr_get_out_samples() to retrieve an
 * upper bound on the required number of output samples for the given number of
 * input samples. Conversion will run directly without copying whenever possible.
 *
 * @param s         allocated Swr context, with parameters set
 * @param out       output buffers, only the first one need be set in case of packed audio
 * @param out_count amount of space available for output in samples per channel
 * @param in        input buffers, only the first one need to be set in case of packed audio
 * @param in_count  number of input samples available in one channel
 *
 * @return number of samples output per channel, negative value on error
 */
int swr_convert(struct SwrContext *s, uint8_t **out, int out_count,
                                const uint8_t **in , int in_count);

其他相关代码

int av_get_bytes_per_sample(enum AVSampleFormat sample_fmt); // 一个采样数据占用字节数
int av_sample_fmt_is_planar(enum AVSampleFormat sample_fmt); // 采样数据是否为平面格式

示例代码

输入pcm文件格式为数据深度16位、44100Hz采样频率、双通道(packed)，要求输出pcm文件格式为数据深度32位整形、44100Hz采样频率、双通道（plannar）。

#include <stdio.h>

#ifdef __cplusplus  
extern "C" {
#endif  

#include "libswresample/swresample.h"

#include "libavutil/opt.h"

#ifdef __cplusplus  
}
#endif 


int main()
{
    //输入文件和参数
    FILE *in_file = fopen("../files/Titanic_44100_s16_stero.pcm", "rb");
    const int in_sample_rate = 44100;
    AVSampleFormat in_sfmt = AV_SAMPLE_FMT_S16;  // 输入数据交错存放，非plannar
    uint64_t in_channel_layout = AV_CH_LAYOUT_STEREO;
    int in_channels = av_get_channel_layout_nb_channels(in_channel_layout);
    const int in_nb_samples = 2048;

    int in_spb = av_get_bytes_per_sample(in_sfmt);


    // 输出文件和参数
    FILE *out_file = fopen("out.pcm", "wb");
    const int out_sample_rate = 48000;
    AVSampleFormat out_sfmt = AV_SAMPLE_FMT_S32P;
    uint64_t out_channel_layout = AV_CH_LAYOUT_STEREO;
    int out_channels = av_get_channel_layout_nb_channels(out_channel_layout);
    int out_nb_samples = av_rescale_rnd(in_nb_samples, out_sample_rate, in_sample_rate, AV_ROUND_UP);

    int out_spb = av_get_bytes_per_sample(out_sfmt);

    //使用AVFrame分配缓存音频pcm数据，用于转换
    AVFrame *in_frame = av_frame_alloc();
    av_samples_alloc(in_frame->data, in_frame->linesize, in_channels, in_nb_samples, in_sfmt, 1);

    AVFrame *out_frame = av_frame_alloc();
    av_samples_alloc(out_frame->data, out_frame->linesize, out_channels, out_nb_samples, out_sfmt, 1);

    // swr上下文
    //SwrContext *swr_ctx = swr_alloc();
    //av_opt_set_channel_layout(swr_ctx, "in_channel_layout", in_channel_layout, 0);
    //av_opt_set_channel_layout(swr_ctx, "out_channel_layout", out_channel_layout, 0);
    //av_opt_set_int(swr_ctx, "in_sample_rate", in_sample_rate, 0);
    //av_opt_set_int(swr_ctx, "out_sample_rate", out_sample_rate, 0);
    //av_opt_set_sample_fmt(swr_ctx, "in_sample_fmt", in_sfmt, 0);
    //av_opt_set_sample_fmt(swr_ctx, "out_sample_fmt", out_sfmt, 0);
    //swr_init(swr_ctx);

    SwrContext *swr_ctx = NULL;
    swr_ctx = swr_alloc_set_opts(swr_ctx, 
                                 out_channel_layout, out_sfmt, out_sample_rate, 
                                 in_channel_layout, in_sfmt, in_sample_rate, 0, NULL);
    swr_init(swr_ctx);

    修改参数
    //av_opt_set_int(swr_ctx, "in_sample_rate", in_sample_rate, 0);
    //swr_init(swr_ctx);


    // 用于读取的缓冲数据
    int buf_len = in_spb*in_channels*in_nb_samples;
    void *buf = malloc(buf_len);
    
    // 转换保存
    int frameCnt = 0;

    while(1) {  
        // read samples
        int read_samples = fread(in_frame->data[0], in_spb*in_channels,in_nb_samples, in_file);
        if(read_samples <= 0)
           break;

        // convert prepare
        int dst_nb_samples = av_rescale_rnd(
            swr_get_delay(swr_ctx, in_sample_rate) + in_nb_samples,
            out_sample_rate,
            in_sample_rate, AV_ROUND_UP);

        if(dst_nb_samples > out_nb_samples) {
            av_frame_unref(out_frame);

            out_nb_samples = dst_nb_samples;

            av_samples_alloc(out_frame->data, out_frame->linesize, out_channels, out_nb_samples, out_sfmt, 1);
        }

        // convert
        int out_samples = swr_convert(swr_ctx, 
                                      out_frame->data, out_nb_samples,
                                      (const uint8_t**)in_frame->data, read_samples);

        // write
        if(av_sample_fmt_is_planar(out_sfmt)) { // plannar
            for(int i = 0; i < out_samples; i++) {
                for(int c = 0; c < out_channels; c++)
                    fwrite(out_frame->data[c] + i*out_spb, 1, out_spb, out_file);
            }
        }
        else {  // packed
            fwrite(out_frame->data[0], out_spb*out_channels, out_samples, out_file);
        }

        printf("Succeed to convert frame %4d, samples [%d]->[%d]\n", frameCnt++, read_samples, out_samples);
    }

    // flush swr
    printf("Flush samples \n");
    int out_samples;
    do {
        // convert
        out_samples = swr_convert(swr_ctx, 
                                  out_frame->data, out_nb_samples, 
                                  NULL, 0);

        // write
        if(av_sample_fmt_is_planar(out_sfmt)) { 
            for(int i = 0; i < out_samples; i++) {
                for(int c = 0; c < out_channels; c++)
                    fwrite(out_frame->data[c] + i*out_spb, 1, out_spb, out_file);
            }
        }
        else {
            fwrite(out_frame->data[0], out_spb*out_channels, out_samples, out_file);
        }

        printf("Succeed to convert frame %d samples %d\n", frameCnt++, out_samples);
    }
    while(out_samples);


    // free
    av_frame_free(&in_frame);
    av_frame_free(&out_frame);

    swr_free(&swr_ctx);

    free(buf);
    fclose(in_file);
    fclose(out_file);
}

输出采样数据个数
输出采样频率发生变化，那么单通道采样个数也响应发生变化。频率变高，采样数据增加；频率降低，采样数据减少。计算方式为

int out_nb_samples = av_rescale_rnd(in_nb_samples, out_sample_rate, in_sample_rate, AV_ROUND_UP);

转换数据个数计算

在实际使用中，可能存在输入采样数据个数变化/延时，当输入增大，swr_ctx内部会进行缓冲，不及时取出可能造成数据堆积，影响输出（例如实时推流）。
此时需要重新分配空间，接收当前转换数据及缓冲数据，

int dst_nb_samples = av_rescale_rnd(
            swr_get_delay(swr_ctx, in_sample_rate) + in_nb_samples,
            out_sample_rate,
            in_sample_rate, AV_ROUND_UP);
if(dst_nb_samples > out_nb_samples) {
	// 释放原空间，重新分配
}

swr_convert调用及结果处理

传参时，输出的缓冲数据区和对应的采样数据量，是动态调整的结果值。处理转换后的采样数据时，应该以swr_convert返回值为准。

例如实际转换得到的采样数据数量为out_samples，则后续处理为

    // write
    if(av_sample_fmt_is_planar(out_sfmt)) { // plannar
        for(int i = 0; i < out_samples; i++) {
            for(int c = 0; c < out_channels; c++)
                 fwrite(out_frame->data[c] + i*out_spb, 1, out_spb, out_file);
        }
    }
    else {  // packed
        fwrite(out_frame->data[0], out_spb*out_channels, out_samples, out_file);
    }

最后flush时的输出处理也同上。

运行结果截图

在这里插入图片描述
使用audacity工具载入pcm文件设置参数，如下

数据正常。

aworkholic

关注

5
点赞
踩
15

收藏

觉得还不错? 一键收藏
打赏
3
评论
ffmpeg学习函数分析swr_convert

有关ffmpeg中主要的api函数源码解析参考雷神系列文章，整理如下 ffmpeg学习（2）获取和使用，源码分析。libswresample主要是用于音频的重采样和格式转换的,包含如下功能：采样频率转换：对音频的采样频率进行转换的处理，例如把音频从一个高的44100Hz的采样频率转换到8000Hz；从高采样频率到低采样频率的音频转换是一个有损的过程声道格式转换：对音频的声道格式进行转换的处理，例如立体声转换为单声道；当输入通道不能映射到输出流时，这个过程是有损的，因为它涉及不同的增益因素和混合。采样
复制链接

扫一扫